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1.  INTRODUCTION 


\S 


1.1.  History 


During  the  1983  JASON  summer  study  wt!  continued  our  investigation  of  techniques  that 
might  be  used  to  achieve  a  radical  improvement  in  digital  computer  performance.  In  ©Ur  previous 
study  we  investigated  "residue  arithmetic”  and  "symbolic  computing”[l).  In  this  report  we-will 
discuss  an  extension  of  our  "symbolic  computing”  into  "source  program  transformations’^  and  a 
new  topic,  "reversible  computing". 

t 

J-.  -  ' 

1.2.  Scope  ’ 

The  general  goal  of  o»r  earlier  study  was  to  identify  new  approaches  for  computer  system 
development.  The  goal  of  this  report  is  the  same.  We  will  try  to  identify  critical  mathematical 
and  computer  concepts  that  could  lead  to  a  radical  increase  in  future  computer  performance,  in 
calculating  important,  and  currently  difficult,  problems.  . 


1.3.  Nature  of  Difficult  Computing  Problems 

There  is  a  set  of  difficult  computing  problems  that  have  great  economic  importance.  This 
domain  is  characterized  by  massive  numerical  calculations,  symbolic  calculations  and  search.  For 
example,  the  design  of  a  modern  VLSI  circuit  involves  symbolic  calculations  (calculus,  etc.), 
numeric  simulation  of  analog  circuit  properties,  and  search  over  a  design  space  to  find  a  near- 
optimal  (or  even  sometimes  just  a  feasible)  solution.  Today  such  problems  are  solved  by  a  combi¬ 
nation  of  human  labor  and  machine  calculations.  Symbolic  calculations  are  either  done  by  hand 
or  by  use  of  a  symbolic  system  such  3S  MACSYMA[2].  Then  these  results  are  hand  converted 
into  a  FORTRAN  program,  or  parameters  for  a  SPICE  program[3]  run.  The  results  from  the 
above  analysis  would  then  be  examined  by  hand,  and  new  computer  runs  would  be  made.  Slowly, 
a  design  for  a  VLSI  chip  would  evolve.  A  single  design  for  a  VLSI  can  cost  tens  of  million  of  dol¬ 
lars  for  the  human  labor  alone.  The  greatest  difficulty  in  improving  this  process  is  the  difficulty 
of  implementing  an  automatic  search  that  is  efficient,  i.e.,  not  exhaustive.  As  we  shall  see,  sym¬ 
bolic  manipulation  will  aid  the  solution  of  this  problem. 


2 


1.4.  Limitations  to  High  Performance 

The  primary  factors  that  limit  the  performance  of  today’s  computers  are: 

(1)  The  speed  of  light  limits  how  fast  signals  can  be  propagated  throughout  a  computer. 

(2)  The  serial  nature  of  computer  calculations  limits  parallel  execution. 

In  the  past,  computer  performance  has  been  improved  by  improving  the  speed  of  the  logic 
gates;  this  approach  is  becoming  more  difficult  every  year  as  integrated  circuit  techniques  are 
maturing.  Currently,  the  heat  dissipation  of  logic  circuits  prevents  them  from  being  packed  closer 
together,  and  the  resulting  separation  causes,  due  to  the  finite  speed  of  light,  an  inherent  propaga¬ 
tion  delay  that  limits  the  performance  of  serial  machines.  To  break  this  limitation,  either  a 
totally  new  technology  of  radically  smaller  dimensions  and  efficiency  in  power  dissipation  is 
required;  and/or  new  organizational  principles  for  parallel  execution  of  computer  algorithms  are 
necessary. 

1.5.  Possibilities  for  Radical  Improvement 

1.5.1.  Smaller  sized  components 

To  achieve  a  radical  improvement  then,  we  can  seek  radically  smaller  logical  components 
with  radically  improved  efficiencies.  A  speculative  approach  to  this  problem  will  be  considered 
later  in  this  report.  If  these  components  are  sufficiently  fast,  then  our  current,  serial  designs  for 
computers  will  suffice.  If  not,  then  concurrent  execution  techniques  will  be  needed  to  achieve  a 
radical  improvement. 

1.5.2.  Transformation  of  programs 

Since  humans  are  not  always  good  at  expressing  tasks  in  either  efficient  or  concurrent  form, 
we  will  need  to  develop  techniques  to  transform  source  programs.  This  will  require  symbolic 
manipulation  of  source  program  fragments. 
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2.  PROGRAM  TRANSFORMATIONS 

The  main  task  of  a  compiler  is  to  transform  a  source  language  into  an  object  language. 
Improving  the  efficiency  of  the  resulting  program  is  the  purpose  of  compiler  optimizations.  The 
ideas  for  this  stem  from  compiler  theory,  structured  programming[4j,  and  artificial  intelli¬ 
gence^,  6].  Cocke  and  Allen[7]  discuss  about  twenty  transformations  for  compilers.  The  transfor¬ 
mation  of  source  programs  into  new  source  programs  is  not  a  new  idea.  It  has  been  widely  advo¬ 
cated  as  a  method  of  developing  programs,  of  improving  the  efficiency  of  programs  and  of  discov¬ 
ering  concurrency  [8j. 

In  the  past,  such  efforts  have  only  been  partly  successful.  First,  classic  methods  of  optimiza¬ 
tion  in  compilers  have  been  very  successful  and  have  relieved  some  of  the  pressure  for  source-to- 
source  transformations.  Second,  classic  source  languages  (such  as  FORTRAN)  were  ad-hoc 
designs  and  the  corresponding  algebra  of  the  programs  was  extraordinarily  difficult.  Backus,  the 
father  of  FORTRAN,  h3s  examined  this  problem  and  suggested  some  new  directions[9j.  The 
computer  language  ’fp’  was  a  result  of  this  effort.  Finally,  it  has  been  only  recently  that  an 
increasingly  powerful  drive  to  use  parallel  computers  has  existed.  As  a  result,  transformations  to 
convert  serial  constructs  into  concurrent  ones  are  becoming  increasingly  important. 

2.1.  Motivations 

2.1.1.  Program  development  techniques 

The  "Operational  Program  Development"  technique[lO]  is  a  good  example  of  a  new  method 
for  developing  software.  The  basic  idea  is  to  begin  with  a  specification  of  a  program  that  can  be 
executed,  at  least  at  a  high  level.  Then  this  specification  is  transformed  into  a  complete,  detailed, 
and  efficient  source-program.  The  proponents  of  this  method  claim  it  is  a  fast,  inexpensive,  and 
relatively  error-free  method  of  software  engineering. 

2.1.2.  Performance 

In  this  report,  our  primary  focus  is  on  performance.  Techniques  that  could  lead  to  a  radical 
improvement  are  of  especial  interest.  There  are  two  approaches  to  this  that  we  discuss  next. 
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2. 1.2.1.  Algorithm  Improvement 

It  is  difficult  (perhaps  impossible)  to  prove  that  source-to-source  transformations  can  always 
provide  a  radical  improvement,  even  to  programs  that  are  very  large  and  highly  structured.  How¬ 
ever,  some  examples  can  provide  evidence  that  such  transformations  have  potential  for  a  radical 
improvement  on  very  large  and  difficult  computing  problems.  For  example,  consider  the  discrete 
Fourier  transform  (DFTj  of  size  N.  It  has  a  computational  complexity  of  (^.V2)  .  This  transform 
can  be  transformed  (by  humans)  into  the  fast  Fourier  transform  (FFT)  of  complexity 
0(iVlogn  )(ll|.  A  typical  value  of  N  might  be  N  =  1000,  so  that  a  hundred-fold  improvement 
results.  Such  a  transformation  is  currently  beyond  the  capabilities  of  any  automatic  system,  but 
as  we  shall  see,  such  systems  might  be  developed  in  the  future. 

2.1. 2.2.  Algorithm  concurrency 

Sometimes  it  is  easy  for  a  programmer  to  envision  the  potential  concurrency  in  a  program 
he  is  writing.  The  concurrency  in  a  large  matrix  multiply  is  easy  to  see,  for  example.  At  other 
times  it  is  very  difficult  to  envision  and  express  concurrency  even  when  the  programmer  knows  it 
must  exist.  For  example,  consider  the  addition  of  two  very  big  numbers.  We  all  have  learned  a 
serial  algorithm  to  perform  such  additions.  It  is  universally  known  among  programmers  that 
modern  digital  computers  have  parallel  hardware  to  perform  addition,  yet  almost  no  programmer 
could  describe  in  detail  the  highly-concurrent  algorithm  that  is  embedded  in  the  machine 
hardware.  The  serial  carry  operation  for  a  32-bit  adder  requires  approximately  64  time  steps, 
while  the  parallel  (carry-lookahead)  algorithm  requires  only  about  10  time  steps.  Humans  some¬ 
times  can  articulate  only  the  serial  form  of  a  calculation  to  be  performed.  Thus  the  need  for 
automatic  transformation  of  programs  from  serial  to  concurrent  form. 

2.2.  Requirements  for  Transformation 

There  are  a  number  of  requirements  of  a  language  so  that  it  will  facilitate  source-to-source 
transformations.  The  language  should  be  at  a  high  enough  level  that  important  structures  of  the 
original  problem  have  not  been  lost.  There  needs  to  be  a  clean  algebra  of  the  language  that 
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describes  permissible  transformations^]. 

2.2.1.  Language 

In  the  past,  the  programming  languages  employed  in  the  programs  that  were  transformed 
generally  included  both  popular  languages  such  as  FORTRAN  where  the  need  was  great,  and 
languages  with  special  features  that  made  them  attractive  for  transformation,  for  example, 
LISP[12],  APL(l3|,  and  SETL[14],  More  recently,  it  has  been  suggested  that  program  languages 
should  be  designed  with  transformation  in  mind.  John  Backus  has  been  the  main  proponent  of 
this  and  has  proposed  the  language  ’fp’j9].  The  recently  developed  languages  LUCID[15|, 
KRC(16|,  and  PROLOG[l7]  all  show  a  regard  for  the  "algebra”  of  the  language.  In  addition, 
there  have  been  attempts  to  improve  the  algebraic  properties  of  popular  languages;  for  example, 
Loveman's  work  with  a  FORTRAN-like  languagejlSj.  We  will  choose  PROLOG  for  our  work  in 
this  report. 

2.3.  PROLOG 

2.3.1.  History 

Interest  in  the  PROLOG  language  is  growing  very  rapidly,  roughly  doubling  every  year. 
Recently,  in  Japan,  it  was  adopted  as  the  primary  language  for  the  "Fifth-Generation  Computer” 
project.  It  is  not  yet  very  popular  in  the  United  States,  probably  due  to  the  very  mature  pro¬ 
gramming  environment  surrounding  the  LISP  language  and  also  to  "NIH”  1  factors.  In  the  dis¬ 
cussion  that  follows,  it  is  assumed  that  the  reader  has  some  familiarity  with  the  PROLOG 
language.  A  good  tutorial  on  PROLOG  by  Clocksin  and  Mellish[19]  is  recommended  for  those 
readers  not  familiar  with  PROLOG.  The  basic  idea  for  employing  Predicate  Calculus  as  a  basis 
for  a  programming  language  can  be  credited  to  Kowalski[20)  (England).  Since  the  algebra  of 
Predicate  Calculus  is  especially  well  defined,  the  algebra  of  programming  languages  based  on  it 
are  likely  to  also  have  an  especially  well  defined  algebra.  Colmerauer,  in  France,  adapted  the 
theoretical  ideas  of  Kowalski  to  a  practical  programming  language,  PROLOG[l7). 


1  NIH  signifies  ’Not  Invented  Here’ 
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2.3.2.  Analysis 

An  analysis  of  the  PROLOG  language  will  reveal  several  important  features.  First,  the 
language  is  based  on  Predicate  Calculus  and  inherits  much  of  its  formal  structure.  Second,  any 
useful  computer  language  must  deal  with  ’side-effects’,  particularly  data  input  and  output.  These 
side-effects’  disturb  the  otherwise  clean  algebra  of  the  language.  Third,  PROLOG  generally  sub¬ 
sumes  the  LISP  language  without  any  particular  difficulty  and  without  much  violence  to  desirable 
algebraic  properties.  Fourth,  the  ’cut’  operator  of  PROLOG,  a  difficult  construct  to  handle  in 
algebraic  transformations,  is  used  in  several  very  distinct  ways.  The  ’cut’  does  not  modify  the 
semantic  of  the  constructs  it  appears  in,  pro’,  ided  they  contain  no  ’side-effect’  operators.  Its  most 
pervasive  use  is  in  situations  where  only  one  of  several  possible  choices  are  to  be  made.  This  is 
the  CASE’  statement  of  more  familiar  languages.  Unfortunately,  ’cut’  is  also  used  for  much  less 
transparent  constructs.  The  main  use  of  the  ’cut’  is  to  improve  the  efficiency  of  the  program  by 
preventing  useless  calculations.  The  ’cut'  operator,  in  general,  causes  grave  problems  during 
transformation  of  program  fragments  that  contain  it. 

We  have  two  reasons  to  employ  transformations:  First,  we  want  to  improve  program 
efficiency,  and  second  we  want  to  increase  program  concurrency.  Since  "side-effects”  generally 
inhibit  concurrency,  it  is  natural  for  us  to  partition  a  PROLOG  program  into  parts  that  are 
separated  by  "side-effect”  operations.  Within  the  ”side-effect”-Iess  parts,  we  can  freely  apply  a 
large  repertoire  of  methods  to  achieve  efficiency  and  concurrency.  Unfortunately,  we  cannot  use 
many  of  our  transformation  methods  on  those  parts  that  contain  i/o  and  other  "side-effect”  opera¬ 
tions. 


2.3.3.  PROLOG  Structure 

The  structure  of  a  PROLOG  program  is  illustrated  in  Figure  1.  At  the  top  level  is  the  data 
base  and  the  query.  The  data  base  is  a  collection  of  procedures.  Procedures  are  collections  of 
clauses,  all  of  which  have  the  same  name.  Clauses  are  Horn  clauses  from  the  predicate  calculus, 
sometimes  augmented  with  ’cut’  operators.  There  are  two  kinds  of  clauses,  ’facio’  and  ’rules’.  A 
fact  has  only  a  head  consisting  of  a  predicate  name  and  any  arguments  surrounded  by 


Figure  1.  Prolog  program  structure. 
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parentheses.  For  example,  ’f(0,0).’,  is  a  fact.  A  rule  is  a  ’head’,  as  described  above,  connected  to 
a  ’body’  by  the  IF  operator  The  body  is  a  collection  of  ’goals’,  each  of  which  has  the  syntax 
of  a  ’head’.  Thus,  for  example, 

f(A,B)  g(A,X),  h(X,B). 

is  a  rule  that  is  read  as  "Function  /  with  argument  A  and  B  will  be  true  IF  function  g  of  argu¬ 
ments  A  and  X  is  true  AND  if  function  h  with  arguments  X  and  B  is  true  The  variable  argu¬ 
ments,  indicated  by  an  initial  letter  that  is  upper  case,  are  all  assumed  to  be  universally 
quantised. 

Rules  can  be  recursive  as  in  the  following  function  to  sum  the  positive  integers  up  to  N. 


f(0,0). 

s(N,S)  :-  M  is  N-l,  s(M,G),  S  is  N+  G. 


PROLOG  constructs  can  be  related  to  concepts  of  conventional  programming  languages  as 
follows  in  Table  I. 


Table  1.  Comparison  of  Programming  Languages. 

PROLOG 

CONVENTIONAL  LANGUAGES 

cut 

if-then-else;  case 

goal 

procedure  call 

clause 

entry  point  of  3  procedure 

unification 

assignment,  dat3  selector  and  constructor 

recursion 

iteration  and  recursion 

The  special  form  of  PROLOG  constructs  are  especially  helpful  in  the  program  transformations  to 
be  discussed  below. 


2.4.  PROLOG  Transformation  Rules 

PROLOG  supports  several  forms  of  mathematics.  We  have  previously  mentioned  that 
predicate  calculus  is  the  basic  form  of  the  language.  Table  2  illustrates  the  basic  definition  of  the 
predicate  functions  and  the  corresponding  PROLOG  functions. 


♦In  PROLOG,  ’false’  is  replaced  by  ’fail’. 


In  predicate  calculus  there  are  a  large  set  of  transformation  rules  that  can  be  directly 


adapted  to  transform  PROLOG  fragments.  Table  3  summarizes  all  these  transformations, 


Table  3.  Summary  of  PROLOG  Transformations. 


Classification 


Definitions 


Propositional 

Calculus 


Resolvents 


Recurrences 


Derivations 


In-line 


Cachin 


Partial 

Compilation 


Components 


not,  and,  or,  implies 


association,  commutation, 
distribution,  contraposition, 
DeMorganization,  negation 


modus  ponens,  merge,  tautology, 
chaining,  equivalence 


head,  tail,  mixed,  multiple 


chainin 


expansions,  contractions 


and  Table  4 
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NAME 


Definition 


Association 


Table  4.  Basic  Transformations. 


Predicate  Calculus  PROLOG  (no  side-effects 


Distribution 


-(-Z) 

| 

(X*  Y)*  Z  <  =  >  X*(Y*Z) 

(XvY)vZ 

<  =  >  Xv(YvZ) 

X*(YVZ) 

<  =  >  (X*Y)v(X*Z) 

Xv  (Y'Z) 

<  =  >  (XvY)*(XvZ) 

Commutation 


DeMorganization 


Contrapositive 


Modus  Ponens 


Merge 


Tautology 


Nil 


Chaining 


~(X*  Y)  <  =  > 


-(XvY)  <=> 


X 


■X  v-Y 


■X*  -Y 


<==> 


<  =  > 


p.(-pvQ)  <  =  > 

Q 

(pvQ)-(-p  vQ)  <  =  > 

Q 

(PvQ)-(-PvQ)  <  =  > 

(Q  V_Q) 

(PVQ)*(P  v-Q)  <  =  > 

(P  v-p) 

-P*P  <  =  > 

nil 

PDQ'QDR  <  =  > 

PDR 

y:-x. 

u:-not(x).  <  — > 

y:-x. 

z:-not(y). 

u:-z. 

u:-not(p). 

u:-q.  <  — > 

v:-q. 

v:-p,u. 

u:-p. 

u:-q. 

v:-not(p).  <  =  > 

r:-q. 

v:-q. 

r:-u,v. 

u:-p,q. 

v:-not(p),not(q).  <  =  > 

r. 

r:-u. 

r:-v. 
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and  Table  5  provide  more  detail. 


Table  5.  PROLOG  Derived  Transformations  (Examples). 

NAME 

Transformation 

Chaining 

x:-ax. 

y:-ay. 

u:-x,y. 

u:-ax,ay. 

RECURRENCES 

head 

mixed 

tail 

multiple  <  =  >  single 

r(0). 

r(A):-r(B),f(A,B),g(A.B). 

EQBlBSSiSKSyl 

r(0) 

r(A):-f(A,B),  r(B),  g(A.B). 

r(0). 

r(A):-f(A,B),  g(A,B),  r(B). 

■iuJiSlLullJ  W'4  »1  W 1 1  ■IMHii 

B3 

r(0). 

r(A):-r(B),f(A,B),g(A,B). 

r(0). 

r(l). 

r(A):-f(A.B,),  r(A),  r(B). 

a 

s(0,l). 

s(A,B):-f(A,B),  s(A,B). 
r(A):-s(A,B). 

In-line 

Expansion  &  Contraction 

f. 

r:-a.f,z. 

ES 

r:-a,z. 

r:-a,s,t,z. 

<=> 

f:-s,t. 

r:-a,f,z. 

Caching 

a(0). 

a(l). 

b(x):-a(y),  x  is  2*y. 
b(0)? 

_  b(l)? 

<=> 

a(0). 

a(l). 

b(0). 

b(l):-fail. 

b(x):-a(y),  x  is  2*y. 

Partial  Compilation 

a(0). 

a(I). 

b(x):-a(y),  x  is  2*y. 

<==> 

a(0). 

a(l). 

b(0). 

_ M2). 

For  example,  in  Predicate  calculus  if  a  D  b  (read  this  as  a  implies  b)  and  b  D  c  ,  then  it  can  be 
concluded  that  a  D  c.  Similarly,  the  PROLOG  fragment 

b  >  a. 
c  >  b. 

can  be  transformed  (if  there  is  no  other  use  of  b)  into  the  simplified  fragment: 

c  >  a. 


PROLOG  also  supports  the  algebra  (and  arithmetic)  of  the  reals  (both  integer  and  floating  point) 
within  a  goal.  Thus  a  goal  can  be  an  arithmetic  construction  such  as  "Sum  is  A  +  3  *  B”.  In 
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the  usual  algebraic  way,  fragments  such  as 
...  ,  B  is  A  +  1,  sum  is  A  +  3  *  B,  ... 
can  be  simplified  (if  there  is  no  other  use  of  B)  into 
...  ,  sum  is  4  *  A  +  3,  ... 

2.5.  Recurrences 

Recurrences  can  appear  in  three  forms,  as  in  the  following  function  to  sum  the  positive 
integers  up  to  N.  This  is  a  mixed  recurrence. 

s(0,0). 

s(N,S)  M  is  N-l,  s(M,G),  S  is  N+  G. 

Tail  recursion  is  an  especially  desirable  form  because  it  is  very  efficient  in  terms  of  use  for 
memory.  It  is  equivalent  to  a  ’DO’  loop  in  FORTRAN.  An  example  of  tail  recursion  for  the 
same  sum-the-positive-integers  task  is; 

s(N,S)  s(N,0,S). 
s(0,S,S). 

s(N,A,S)  M  is  N-l.  B  is  A+  N,  s(M,B,S). 

Both  of  these  recursions  have  roughly  the  same  number  of  execution  steps;  however,  the  tail 
recursion  has  need  for  much  less  memory  (space)  and  so  is,  ail  other  factors  being  equal,  more 
desirable  than  the  first  form.  Note  however  the  cost.  The  first  form  is  a  bit  more  compact  and  is 
easier  to  comprehend. 

2.5. 1.1.  Head  Recursion 

Above  we  discussed  general  (or  ’mixed’)  recursion  and  tail-recursion.  A  form  we  will  name 
’head-recursion’  will  Jso  be  important  for  our  transformations.  Consider  our  previous  mixed- 
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recursion  example  of  summing  the  possible  integers  up  to  N.  By  transforming  the  first  goal,  (an 
arithmetic  statement)  into  N  is  M  +  1,  and  re-arranging  the  goals  according  to  the  predicate  cal¬ 
culus  commutative  rule  we  obtain  our  ’head-recursion’  form. 

s(0,0). 

s(N,S)  :-  S(M,G),  N  is  M  +  1,  S  is  N  +  G. 

Because  of  the  default  computation  rule  employed  by  the  PROLOG  evaluator,  this  is  not  an 
efficient  form  but  does,  of  course,  have  the  same  semantics  as  the  original  form.  This  ’head- 
recursion’  form  will  be  important  in  later  transformation  examples. 

2.0.  Comments 

These  three  forms  illustrate  the  goal  of  our  transformation  method.  We  seek  transformation 
algorithms  to  convert  at  will  between  these  forms. 
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3.  EXAMPLES 

In  this  chapter  we  will  examine  five  different  transformation  problems.  Each  represents  an 
important  process  for  achieving  efficiencies  and/or  concurrency. 

3.1.  Triple  Append 

This  problem,  in  a  LISP  environment,  has  been  examined  by  Wegbreit[2l|.  Because  of  the 
differences  between  LISP  and  PROLOG,  it  is  instructive  to  see  how  the  PROLOG  version  differs 
from  the  LISP  one.  The  problem  is  to  join  together  three  lists,  using  the  usual  algorithm  for  join¬ 
ing  two  lists.  Then  the  result  is  transformed  into  a  more  efficient  form.  The  definition  for 
appending  one  list  to  another  is: 

append  ((j,Z,Z).  append  (\XH  |  XT],  Y \XH  |  ZT\)  :-  append  ( XT ,  Y ,ZT)  .  2 

In  order  to  more  easily  manipulate  our  programs,  we  will  use  an  abbreviated  form  as  follows. 

a([\,Z,Z).a(\XH\XT\,Y,[XH\ZT]):~a(XT,r,Zn 

The  result  is  that  list  Z  is  the  list  Y  appended  to  list  X.  To  join  three  lists  A,B,C  into  the  single 
list  D,  we  employ  the  above  PROLOG  procedure  and  define  our  three  list  append  as: 


b(A,B,C,D):-a(A,B,E),a(E,C,D). 

Now  this  is  a  perfectly  acceptable  program.  It  can  be  improved  however.  Notice  that  first  list  A 
must  be  traversed  in  order  to  append  list  B.  Then  in  the  next  goal,  list  E,  composed  of  lists  A 
and  B  must  be  traversed  to  append  list  C.  The  cost  is  then  2 lA  +  lg  procedure  calls  to  a,  where 
lx  is  the  length  of  list  X.  This  cost  might  be  reduced  to  lA  +  lg  by  program  transformation.  To 
do  this  consider  two  cases: 

Case  1  A  =  []. 

List  A  is  null. 


2  In  PROLOG,  given  a  list,  the  notation  [HIT]  indicates  that  H  is  the  first  item  of  the  list  and  T  is  the  remainder. 
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b((j,B,C,D)  a([j,B,E),  a(E,C,D). 

but  a([],B,E)  >  true  if  E  +  B.  (by  partial  evaluation) 

Thus 

b((],B»C,D)  a(B,C,D). 

Case  2  A  [] 

l  Aj\,B,C \Dh  |  Z?rJ):-a((Aw  |  AT\,B,[EH  j  ET|),a((EH  |  ET\,C,\DH  |  DT\) 

but  a 

(|A/r  |  At\,B\Eh  |  ET]y-a(AT,B,ET)IFEH  =  AH 
(by  partial  evaluation).  Thus 

M|A/r  |  AT\,B,C \Ah  |  Ah  \  DT\):-a(Af,E  ,ET),a(ET,C  ,DT)- 
Now  by  the  derived-chaining  rule,  the  r.h.s.  is  defined  by  the  original  definition  of  b; 

ft([Aw  |  aT\,B,C,lAH  |  DT\):-b(AT,B,C,DT). 

This  new  procedure  is  thus 


b([],B,C,D):-a(B,D,D). 

&([A// 1  At\,B ,C ,\Ah  \  DT]):-b(AT,B ,C ,Dt)- 
This  is  more  efficient.  It  calls  itself  lA  times  and  calls  a,  lg  times,  a  saving  of  lA  calls. 

3.2.  Fibonacci  Recurrence 

In  the  next  example,  we  define  a  procedure  to  calculate  the  Ntk  value  of  the  Fibonacci 
sequence.  The  Fibonacci  sequence  is 

1  ,  1  ,  2 , 3 , 5 , 8  ,  ... 

when  f.  —  ft-t  +  / ,-2- 

la  PROLOG  this  is 

fib(0, 1). 
fib(l,l). 
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fib(N,F)  >  M  is  N-l,  fib(M.G),  L  is  N-2,  fib(L.H),  F  is  G+  H. 

This  is  very  inefficient.  The  separate  calculations  of  fib  on  the  r.h.s.  often  calculate  the  same 
values.  It  has  a  complexity  of  0(2^.  We  will  transform  it  to  a  more  efficient  form.  For  nota- 
tional  purposes  we  will  compress  the  above  procedure  as  follows. 

f(0,l). 

f(l,l). 

f(N,G+  H)  f(N-l.G),  f(N-2,H). 

Now  let  us  define  a  new  clause  to  represent  the  r.h.s. 

g(N-l,G,N-2,H)  f(N-l,G,),  f(N-2,H). 

thus 

g(N,F ,N-1,G)  >  f(N,F),  f(N-l,G). 
r(N,G+  H)  g(N-l,G  N-2,H). 
and 

g(U.l-U)  f(  1, 1),  f(0,l). 
now  since 

f(N-l,G)  >  f(N-l.G). 

f(N-I,F)  f(N-l,G),f(N-2,H)  F  is  G+  H, 

then  by  the  derived-chaining  rule, 

f(N,F),  ff.N-l.G)  >  f(N-l,  G),  f(N-2,H)  F  is  G+H 

thus 

g(N,F,N-I,G)  g(N-I,G,N-2,H),  F  is  G+H. 

In  canonical  form: 
g(U,0,l). 

g(N,F,M,G)  M  is  N-I,  L  is  M-l,  g(M,G,L,H),  F  is  G+H. 

Note  that  L  now  has  no  particular  function. 


Thus 
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g(l.U). 

g(N,F,G)  M  is  N-l,  g(M,G,H),  F  is  G+  H. 

This  is  a  much  improved  algorithm,  with  a  complexity  of  0(N).  Thus  it  has  only  about  N  calls, 
far  fewer  than  the  number  of  calls  of  the  original  algorithm.  It  is  possible  to  find  an  algorithm 

that  is  of  only  0 (log2N)  complexity  as  opposed  to  the  0(JV)  complexity  of  our  present  algorithm. 

It  turns  out  that  for  large  N  (approximately  N  =  50)  this  new  algorithm  is  an  improvement.  The 
basic  idea  of  the  new  algorithm  is  to  turn  the  ’mixed-recurrence’  into  a  ’head-recurrence’  and 
solve  it  using  matrix  techniques.  The  improved  recurrence  derived  above  is: 

g(M,F,G)  :-  L  is  M-l,  g(L,G,H),  F  is  G+H. 

This  is  equivalent  to: 

g(l,M,F,G)  :-  L  is  M-l,  g(l,L,G,H),  F  is  G+  H. 

g(l,M,F,G)  :-  g(l,L,G,H),  M  is  L+  1,  F  is  G+  H. 

g(l,M,F,E)  :-  g(l,L,G,H),  M  is  L+  1,  F  is  G+  H,  E  is  G. 

g(l,M,F ,E)  :-  g(l,L,G,H), 

1  is  1, 

M  is  L+  1, 

F  is  G+  H, 

E  is  G. 

We  now  solve  for  the  initial  condition: 


g(U,l.l). 

Now,  converting  to  matrix  form: 
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0(1, 1,1,1)"  = 


1  0  0  0' 

hi 

V 

110  0 

1 

0  0  11 

1 

0  0  10 

1 

The  matrix  can  be  expanded  by  calculating  successive  squarings  with  only  (log  M)  complexity. 


3.3.  Arithmetic  Series 

This  example  illustrates  the  power  that  transform  techniques  can  sometimes  achieve.  Con¬ 
sider  a  general  arithmetic  series  involving  polynomials.  Such  a  form  often  appears  in  the  ’do’ 
loops  of  FORTRAN  programs.  An  example  as  it  would  appear  in  a  language  similar  to  FOR¬ 
TRAN  is: 


f(0)  =  0. 
do  (  n  <-  1,N  ) 

(f(n)<-f(n-l)+  n**4-4*n**3+  3*n**2-2*n+  1}. 

This  can  be  expressed  in  PROLOG  as: 
f(0,0). 

f(N,F)  :-  M  is  N  -  1, 
f(M,  FI), 

F  is  FI  +  N*N*N*N  -  4*N*N*N  +  3*N*N  -  2*N  +  1. 

Note  that  the  complexity  of  this  program  is  15N  arithmetic  operations. 

It  is  possible  to  automatically  transform  any  such  arithmetic  series  by  means  of  a  PROLOG 
program  written  by  Peter  Van  Roy  (unpublished).  This  program  is  provided  in  Appendix  A-l. 

The  improved  form  for  f  as  automatically  generated  by  Van  Roy’s  program  is: 
f(_3,_6):-_6  is 

((((-6*_3+  15)*_3+  20)*_3+  15)*_3+  -14)*_3/-30. 

In  more  readable  form  this  is: 


f(N,F):-  F  is 
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((((-6*N+  15)*N+  20)*N+  15)*N-14)*N/-30. 

Now  note  that  this  solution  is  much  simplified.  Its  complexity  is  only  10  arithmetic  operations, 
independent  of  the  value  of  N. 

3.4.  Head  Caching 

In  a  simple  PROLOG  interpreter,  once  a  goal  fails,  all  the  context  of  its  sub-goals,  both 
those  that  failed  and  those  that  succeeded,  is  destroyed  by  popping  both  the  environment  and  the 
value  stacks. 

Later,  if  these  same  sub-goals  are  encountered,  all  the  work  of  proving  them  must  be 
repeated.  This  work  could  be  avoided  by  storing  the  heads  of  the  clauses  and  the  result  of  the 
evaluation.  In  effect  we  wish  to  add  a  new  clause  to  the  data  base. 

It  can  be  seen  that  this  technique  enhances  performance  at  the  cost  of  memory  space  (to 
store  the  cache  entries).  The  modern  trend  seems  to  be  ever  decreasing  costs  for  memory,  so  this 
may  be  an  attractive  method  to  improve  performance  for  some  applications  that  are  more  time- 
bound  than  space-bound. 

This  technique  is  not  a  general  one  that  should  be  universally  applied.  For  example  if  a 
procedure  is  being  used  as  a  generator  of  values,  it  is  not  appropriate  to  cache  its  intermediate 
results  because  they  could  be  scrambled  (in  order  of  appearance)  by  caching.  Also,  if  side-effect 
operations  occur,  such  as  assert  or  read,  then  caching  can  change  the  expected  behavior  of  the 
executing  program.  It  is  also  true  that  some  cached  values  are  much  more  valuable  than  others, 
so  some  selectivity  in  caching  is  desirable  to  optimally  utilize  memory  space. 

Our  method  of  overcoming  these  drawbacks  is  to  add  a  new  mode  declaration  to  the  PRO¬ 
LOG  interpreter  language.  Warren(22]  used  this  method  when  he  introduced  his  operator 
"mode".  Clark  and  McCabe  [23j  elaborated  on  this  and  introduced  several  more  mode  operators. 
We  propose  a  "cache"  mode  operator  to  declare  the  desirability  of  caching  a  named  procedure. 
Thus  if  we  wished  to  cache  the  results  of  the  "ancestor”  procedure,  we  would  include  "cache 
(ancestors)."  in  the  program. 
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As  3  result,  the  programmer  can  select  only  those  procedures  for  caching,  that  are  likely  to 
be  greatly  improved  by  caching.  Procedures  at  too  low  a  level  to  benefit  from  caching,  generator 
procedures  or  procedures  with  side-effects  can  be  avoided  (in  the  most  natural  way)  for  caching 
purposes. 

In  order  to  illustrate  the  potential  of  this  method,  we  will  examine  expand  an  example  pro¬ 
gram,  a  procedure  to  determine  if  two  people  are  related.  It  is: 

related  (X,Y)>  ancestor  (X.Y). 
related  (X,Y)>  ancestor  (Y,X). 
related  (X  Y)>  ancestor  (X,Z),  ancestor  (Y,Z). 

Thus  two  people,  X  and  Y  are  related  if  one  is  the  ancestor  of  the  other  or  if  they  have  a  common 
ancestor,  Z.  In  order  to  compress  the  bulk  of  the  programs  to  follow,  we  will  reduce  all  the  names 
in  our  example  to  simple  letters,  the  first  letter  of  the  name.  The  compressed  program  follows 
where  m  represents  male,  c  represents  child,  and  f  represents  father.  The  letters  t,j,g,b,v,  all 
represent  individual  people. 


m(t). 

m(j). 

m(g). 

m(b). 

c(v.g). 

c(g.b). 

c(j-g)- 

c(t,j). 

f(X.D):-m(D),c(X.D). 

a(A,X):-f(A,X). 
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a(A,X).-f(Y,X),a(A,Y). 


r(X,Y):-a(X,Y). 


r(X,Y):-a(Y,X). 


r(X,Y):-a(X,Z),a(Y,Z). 


As  a  measure  of  performance  we  will  count  calls  to  clauses.  This  is  roughly  proportioned  to 
the  number  of  logical  inferences  (LI)8  since  for  our  example  there  are  about  two  logical  inferences 
(LI)  per  clause. 

For  purposes  of  illustration,  we  will  be  interested  in  a  compound  query:  ‘Is  t  related  to  v?’ 
followed  by:  Is  v  related  to  t?’.  We  wish  to  know  the  number  of  calls  required  to  answer  this 
query. 

It  is  a  simple  matter  to  instrument  our  example  program  and  count  each  call.  We  define  a 
count  procedure  ‘cnt  (x)’  and  call  it  just  as  we  start  the  execution  of  the  body  of  the  clause  (See 
Appendix  A-2).  A  much  more  elaborate  version  of  this  will  be  discussed  later.  When  the  query  is 
then  executed,  the  first  half  of  the  query  ’r(t,v)?’  requires  59  calls  and  the  second,  ’r(v,t)?’,  35  calls 
for  94  total. 

How  much  could  caching  reduce  this  figure?  One  way  to  find  out  would  be  to  re-write  a 
PROLOG  interpreter  to  include  the  ’’cache”  operator  as  discussed  above.  In  this  study  however, 
we  re-wrote  the  example  program  to  call  a  cimu!ated  cache  system.  One  might  imagine  that  each 
of  the  statements  was  specified  to  be  cached,  and  the  ‘logical’  consequence  is  the  re-written  pro¬ 
gram. 

This  system  was  implemented  to  produce  the  measurements  of  the  numbers  of  calls  and  a 
trace  during  execution  of  queries.  The  simulated  cache  was  written  in  PROLOG  and  our  meas¬ 
urements  were  made  with  a  conventional  PROLOG  interpreter.  The  re-written  example  is: 

LI  for  Logical  Inferences  seems  to  hive  become  the  iccepted  measure  of  work  in  executing  logic  programs.  The 
more  common  form  is  'LIPS*  for  'Logical  Inferences  per  Second*  We  assume  an  LI  is  the  unification  of  a  simple  van- 
able 


m(t)>  cntjm). 
m(j)>  cnt(m). 
m(g)>  cnt(m). 
m(b):-  cnt(m). 
c(v,g)>  cnt(c). 
c(g,b)>  cnt(c). 

'(j.g)>  cnt(c)- 

c(t,j ):-  cnt(c). 

f(X,D)>  hf(f,X,XP,D,DP),  m(D),c(X,D),  tf(f,X,XP,D,DP). 
a(A,X):-  hs(a,A,AP,X,XP),  f(A,X),  ts(a,A,AP,X,XP). 

a(A,X)>  hf(a,A,AP,X,XP),  f(Y,X),a(A,Y),  tf(a,A,AP,X,XP). 
r(X,Y)>  hs(r,X,XP,Y,YP),  a(X,Y),  ts(r,X,XP,Y,YP). 

r(X,Y)>  hs(r,X,XP,Y,YP),  a(Y,X),  ts(r,X,XP,Y,YP). 

r(X,Y)>  hf(r,X,XP,Y,YP),  a(X,Z),a(Y,Z),  tf(r,X,XP,Y,YP). 

cache(f). 

cache(a). 

cache(r). 


The  cache  simulation  program  can  be  found  in  Appendix  A-2. 

There  are  four  kinds  of  calls  to  the  cache  system.  These  are  hs,hf,ts,tf.  For  a  procedure  with 
only  a  simple  clause  such  as  T,  we  employ  ‘hf  and  ‘tf’.  The  first,  ‘hf,  creates  a  cache  entry  that 
represents  the  failure  of  this  clause  with  the  variable  bound  as  the  original  clause  was  called.  If 
the  clause  indeed  fails,  then  nothing  further  happens  and  the  cache  entry  remains.  On  the  other 
hand,  if  the  clause  should  succeed,  then  this  clause  entry  must  be  replaced  by  a  entry  representing 
success,  but  with  the  new  binding  determined  by  the  body  of  the  clause.  This  is  the  function  of 


I 


23 


*tf*  which  appears  at  the  end  of  each  clause.  If  backtracking  within  the  clause  occurs,  then  each 
successful  result  must  also  be  entered  into  the  cache. 

For  multiple  clause  procedures,  only  the  last  clause  can  indicate  a  failure,  so  there  is  no 
caching  at  the  head  of  any  clause  except  the  last  clause  of  a  procedure.  The  call  ‘hs'  is  just  used 
for  instrumentation. 

The  call  ‘ts’  is  similar  to  ‘tf  but  since  no  "fail”  was  entered  into  the  cache  for  this  clause 
none  should  be  extracted. 

The  ‘primed’  variables  (XP,ZP)  that  appear  in  the  cache  calls  are  needed  because  the  vari¬ 
able  bindings  cached  by  the  ’’fail”  at  the  beginning  of  a  clause  are  not  the  same  as  those  cached 
by  the  "succeed”  at  the  end.  Thus  to  remove  a  previously  cached  "fail”,  those  bindings  must  be 
propagated  from  ‘hf’  to  ‘tF. 

In  inserting  a  new  entry  to  the  cache,  duplicates  (if  any)  should  be  removed.  Also  if  a  more 
general  result  is  cached,  and  subservient  ones  should  be  deleted.  For  example  if  a(i,t):-fail.  is  ini¬ 
tially  in  the  cache  when  a(X,t):-fail  is  to  be  cached,  a(i,t):-fail  should  be  removed  as  it  is  dom¬ 
inated  by  a(X,t):-fail. 

The  cache  program  that  accomplishes  the  above  objections  is  shown  in  Appendix  A-2. 

The  performance  results  of  our  example  with  the  cache  system,  are  shown  in  Table  6. 


Table  6.  Summary  of  Results. 

INITIAL  CALL 

TOTAL  CALLS 

No  Cache 

CACHE 

r(t,  v),  r(v,  t) 

r(v,  t),  r(t,  v) 

r(v.  t) 

35 

5 

32 

r(t.  v) 

59 

47 

5 

Both 

94 

52 

37 

Only  37  total  calls  are  required  in  the  cached  system  as  compared  to  the  94  required  in  the 
uncached  system.  It  is  interesting  to  note  that  if  r(v,t)  is  called  before  r(t,v),  then  52  calls  are 
needed.  The  cache  scheme  clearly  saves  calls  in  this  simple  example. 


The  state  of  the  program  after  the  query  is  shown  below. 


EXECUTED  PROGRAM  LISTING 

m(t) 
cnt(m). 
m(j) 
cnt(m). 
m(g)  > 
cnt(m). 
m(b)  > 
cnt(m). 

c(v,g)  > 
cnt(c). 
c(g,b)  > 
cnt(c). 
c(j,g)  > 
cnt(c). 
c(t,j)  > 
cnt(c). 

t(j'g)  > 

cnt(f). 

I(v,g)  > 

cnt(f). 
f(t,j)  > 
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cnt(f). 

f( _ 1512, _ 1513) 

eq(_1512,t), 

eq(_1513,v), 

! 

*  » 

fail. 

f( _ 1512, _ 1513) 

eq( _ 1512, _ 1522), 

eq(_1513,t), 

! 

•  } 

fail. 

f(_1512,_1513)  > 
eq(_1512,v), 
eq(_1513,j), 
f 

•  » 

fail. 

f(_1512, _ 1513)  > 

eq(_1512,_1522), 

eq(_1513,v), 

i 

•  » 

fail. 

f(_1512,_1513)  > 

h  f(f . _ 1 512, _ 1522, _ 15 13. _ 1 523), 

tn(_1513), 

c(_1512,_1513), 

tf(f,_1512,_1522,_1513,_1523). 


a(v,g) 
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cnt(a). 

a(t,g) 

cnt(a). 

a(t,j) 

cnt(a). 

a(_1516,_15l7) 

eq(_1516,t), 

eq(_1517,v), 

f 

•  » 

fail. 

a( _ 1516, _ 1517)  > 

eq(_1516,v), 

eq(_1517,j), 

i 

- » 

fail. 

a( _ 1516, _ 1517)  > 

eq(_1516,v), 

eq(_l517,t), 

f 

■  f 

fail. 

a(_1516, _ 1517)  > 

hs(a,_1516,_15261_1517t_1527), 
f( _ 1 516, _ 1517), 

ts(a,_1516,_1526,_1517,_1527). 

a( _ 1516, _ 1517)  > 

hf(a,_1516,_1526,_1517,_1527), 

f{ _ 1528, _ 1517 ), 

a(_1516,_1528), 


tf(a,_1516,_1526,_1517,_1527). 

r(v,t) 
cnt(r). 
r(t,v)  > 
cnt(r). 

r(_1520,_1521)  > 
hs(r,_1520,_1530,_1521,_1531), 
a(_1520,_1521), 
ts(r,_1520,_1530,_1521,_1531). 
r(_1520,_1521) 

hs(r,_1520,_1530,_1521,_1531), 

a(_1521,_1520), 

ts(r, _ 1520, _ 1530, _ 1521 , _ 1531). 

r(_1520,_1521)  > 
hf(r,_1520,_1530,_l521,_1531), 
a(_1520,_1532), 

a( _ 1521, _ 1532), 

tf(r, _ 1 520,_1 530,_1 52 1, _ 1 531). 

count(level,2). 

count(a,2). 

count(r,3). 

count(_1522,0). 

CALL  COUNTS 


Total  calls  =  5 
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Notice  how  it  has  been  transformed.  Goth  ‘success-goals'  and  ‘failure-goals’,  are  evident,  as 
are  the  original  clauses.  This  idea  of  transforming  a  PROLOG  program  leads  us  to  another 
related  technique. 

3.5.  Partial  Compilation 

If  all  possible  results  of  executing  a  procedure,  called  with  all  of  its  argument  unbound,  are 
cached,  then  in  some  sense  we  have  transformed  a  procedure  into  a  ‘partially-compiled’  form  that 
executes  particular  queries  very  quickly.  It  may,  of  course,  use  enormous  memory  space.  Again, 
selective  programmer  control  of  such  a  facility  could  be  effective.  Thus  we  propose  the  mode 
"  pcompile(Procedure-name).” 

To  illustrate  the  potential  of  this  technique,  we  will  include  the  statement  "compile  (a)” 
with  the  example  program,  and  the  following  procedures  with  the  cache  program. 


%  Partial-compiler  for  PROLOG  Example. 

%  bead(X,y,Z)  :-  X(Y,Z). 

%  compensation  for  the  principle 
%  functor  not  being  a  variable 

head(f,Y,Z)  :-  f(Y,Z). 
head(a,Y,Z)  :-  a(Y,Z). 
head(r,Y,Z)  :-  r(Y,Z). 

head(fp,Y,Z)  :-  fp(Y,Z). 
head(ap,Y,Z)  :-  ap(Y,Z). 
head(rp,Y,Z)  :-  rp(Y,Z). 


%  compiler 

p_compile(all)  pcompile(X), 
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p_compile(X),fail. 

p_compile(X) head(X,Y,Z),  pexec(X,Y,Z),fail. 

p_compile(X)  >  abolish(X,2),restore(X),!. 

pexec(X,Y,Z)  >  trans(X,XP), 
ycache(XP,Y,Z),!. 

restore(X):-trans(X,XP),rts(XP,Y,Z), 

ass(X,Y,Z),restore(X). 

restore(X). 

trans(f.fp). 

trans(a.ap). 

trans(r,rp). 

ass(fp,YQ,ZQ)>  asserta((fp(YQ>ZQ))). 
ass(ap,YQ,ZQ)>  asserta((ap(YQ,ZQ))). 
ass(rp,YQ,ZQ):-  asserta((rp(YQ,ZQ))). 

rts(fp,YQ,ZQ)>  retract((fp(YQ,ZQ))). 
rts(ap,YQ,ZQ)>  retract((ap(YQ,ZQ))). 
rts(rp,YQ,ZQ)>  retract((rp(YQ,ZQ))). 

Execution  of  the  compiler  results  in  the  following  transformed  program.  Not  counting  the 
compiler  itself  but  only  the  original  and  program,  72  calls  are  required  during  the  compilation. 


EXECUTED  PROGRAM  LISTING 
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m(t) 
cnt(m). 
m(j) 
cnt(m). 
m(g)  > 
cnt(m). 
m(b) 
cnt(m). 

c(v,g)  > 
cnt(c). 
c(g,b) 
cnt(c). 

c(j.g)  > 

cnt(c). 
c(t,j)  > 
cnt(c). 

f(_41,_42)  > 

h  f(  f, _ 4 1  ,_5 1  ,_42,_52), 

m(_42), 

c(_41,_42), 

tf(f,_41,_51,_42,_52). 

a(t,i)  > 
cot(a). 
a(v,g)  > 
cnt(a). 

a(j.g)  > 
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cnt(a). 
a(g,b)  > 
cnt(a). 
a(t,g) 
cnt(a). 
a(v,b) 
cnt(a). 
a(j.b)  > 
cnt(a). 
a(t,b) 
cnt(a). 


r(_49,_50) 

hs(r,_49,_59,_50,_60), 
a(_49,_50), 
ts(r,_49,_59,_50,_60). 
r(_49,_50)  > 

hs(r,_49,_59,_50,_60), 
a(_50,_49), 
ts(  r  ,_49 ,  _59  ,_-50,_60) . 
r(_49,_50) 

bf(r._49,_59,_-50,_60), 

a(_49._61), 

a(_50,_61), 

tf(r,_49,_59,_50,_60). 


count(m,20). 
count(level,7). 
count(f,  18). 
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count(a,18). 
count(c,16). 
count(_51,0). 
count(_51,0). 
count(_51,0). 
count(_51,0). 
count(_51,0). 
CALL  COUNTS 

Total  calls  =  72 


Now  the  compound  query  requires  (without  further  caching)  11  calls  as  compared  to  the  94 
original  calls.  Note  that  even  for  this  query,  fewer  total  calls  are  needed  (72  +  11  =  83). 

It  is  a  waste  of  effort  to  also  compile  T  once  ‘a’  is  compiled.  Some  speed  up  (from  11  to  2 
calls)  could  result  if  ‘r’  were  compiled,  but  this  would  cost  considerable  space  for  little  gain.  It 
can  be  seen  that  selective  pseudo-compiling  can  sometimes  be  very  helpful  in  improving  perfor¬ 
mance. 

For  similar  reasons,  employing  caching  after  compiling  ‘a’  would  not  help  performance. 

3.0.  Comments 

The  above  examples  illustrate  the  potential  of  the  transformation  techniques.  However,  all 
of  the  examples  were  quite  simple  and  half  of  the  examples  were  transformed  by  hand,  not 
automatically.  The  problem  of  automatically  controlling  which  transformations  should  be  applied 
is  a  very  difficult  open  problem. 

A  future  goal  is  the  automatic  transformation  of  the  DFT  algorithm  into  the  FFT  algo¬ 
rithm.  Another  is  the  development  of  the  Strassen  aIgorithm(24|.  Both  of  these  problems  have  a 
common  background;  roughly  speaking,  they  both  appear  to  be  connected  to  certain  questions  in 
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the  theory  of  representations  of  finite  groups,  a  fairly  well  developed  body  of  mathematical 
knowledge.  To  solve  these  and  related  transformation  problems,  a  program,  similar  to  the  Van 
Roy  solver  program  discussed  above,  would  need  to  be  developed.  Such  a  program  would  have  to 
know  not  only  all  the  facts  about  group  representations  but  would  also  have  to  be  able  to  sense 
that  this  particular  area  held  some  facts  and  techniques  which  might  be  relevant  to  the  problem 
at  hand.  Such  a  program  is  likely  to  be  very  complex.  It  is  not  at  all  clear  that  it  could  be 
developed  as  an  expert  (mathematician)  system,  even  if  massive  resources  could  be  provided.  It 
may  even  be  the  case  that  such  a  program  could  not  be  developed  without  some  new  break* 
through  in  the  theory  of  artificial  intelligence,  or  the  development  of  some  new  kind  of  mathemat¬ 
ics. 

On  the  other  hand,  it  is  possible  that  a  clever  mathematician  or  computer  scientist  just 
might  discover  a  new  approach.  Such  a  discovery  could  have  tremendous  consequences  for  high 
performance  computing. 

3.7.  Conclusions  for  Transformations 

The  above  examples  achieved  performance  enhancements  ranging  from  speed-ups  of  1.5  to 
3000,  on  very  simple  problems.  However,  it  is  true  that  these  examples  constitute  plausibility 
arguments,  not  proofs,  that  transformation  techniques  may  be  important  for  achieving  a  radical 
improvement  in  performance.  To  have  a  big  impact,  such  transformations  would  need  to  be 
automatically  controlled  during  program  execution,  so  that  as  more  elements  of  the  solution  are 
developed,  new  transformations  C3n  be  applied.  There  is  currently  very  little  theory  to  guide 
such  dynamic  applications  of  transformations.  If  future  research  is  able  to  accomplish  this,  then  a 
radical  improvement  in  performance  could  indeed  occur. 
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4.  REVERSIBLE  COMPUTING 

4.1.  Introduction 

Reversible  computing  was  investigated  in  an  attempt  to  determine  if  radically  smaller  and 
more  efficient  logical  circuits  might  be  possible.  As  we  shall  see,  density  and  heat  dissipation 
improvements  up  to  ~108  may  someday  be  possible.  We  consulted  one  of  the  pioneers  in  this 
subject,  Dr.  Edward  Fredkin  of  MIT,  at  some  length  and  brought  ourselves  up  to  date  on  the  (not 
very  extensive)  literature.  Our  basic  conclusion  is  that  the  importance  of  reversible  logic  depends 
crucially  on  the  physical  architecture  of  the  computer:  It  is  irrelevant  to  the  current  scheme  in 
which  packets  of  charge  are  stored  on,  and  moved  between,  structures  of  order  one  light 
wavelength  in  size,  but  might  be  relevant  and  even  essential  if  the  basic  information-handling 
units  were  of  molecular  or  atomic  size  (a  distant  but  not  necessarily  unattainable  goal).  The  ques¬ 
tion  of  physical  realization  of  reversible  logic  elements  has  been  almost  completely  neglected  4  in 
favor  of  the  abstract  questions  of  how,  given  the  existence  of  reversible  logic  elements,  one  could 
wire  them  up  to  make  a  useful  computer  and  how  one  would  program  it.  We  think  that  the  prob¬ 
lem  of  how  to  physically  realize  reversible  computation  at  something  like  the  atomic  scale  should 
be  the  next  question  to  be  attacked  in  this  area.  We  also  think  that  the  very  framework  of  rever¬ 
sible  logic  suggests  some  interesting  new  approaches  to  the  problem  of  ultra-small-size  computing 
elements  which  might  be  worth  exploring  for  their  own  sake.  Although  practical  payoff  on  any  of 
these  ideas  is  surely  far  off,  the  computer  science  and  physics  issues  raised  are  fascinating  and  of 
fundamental  importance. 

4.2.  Energy  Dissipation  In  Computing 

Contemporary  computers  dissipate  at  least  10~12  joules  (about  108  kT  if  T  equals  room  tem¬ 
perature)  per  logical  operation.  The  reason  is  that  bits  are  stored  as  charges  on  capacitors 
charged  to  about  one  volt  (the  typical  operating  voltage  of  solid  state  electronic  devices).  Since 
there  is  a  lower  limit  to  the  size  and  capacitance  of  circuit  elements  that  can  be  fabricated  on  a 


4  apart  from  some  interesting  ’existence  proof  work  of  Fredkin  et  al. 
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chip  using  optical  techniques,  there  is  a  lower  limit  to  the  energy  associated  with  storing  one  bit. 
That  limit  turns  out  to  be  the  above  mentioned  1(T12  joules,  and  the  current  style  of  computer 
logic  causes  that  entire  energy  to  be  dissipated  each  time  the  state  of  a  bit  is  changed  [25].  The 
resulting  heat  load  is  a  major  barrier  to  high  speed  computation.  A  major  question  is  the  extent 
to  which  this  dissipation  is  an  inescapable  concomitance  of  computation  and  to  what  extent  it  is 
due  to  "inefficient"  physical  or  logical  design  of  the  computer[26).  Information  theoretic/thermo- 
dynamic  arguments  have  been  used  to  suggest  that  there  is  a  fundamental  dissipation  limit  of  kT 
per  operation  for  computers  designed  on  current  principles. 

In  thermodynamics  there  is  a  well-known  connection  between  dissipation  and  the  reversible 
operation  of  heat  engines.  Standard  computer  logic  elements,  the  NAND  gate  in  particular,  are 
not  even  reversible  as  abstract  logical  operations,  let  alone  as  physical  devices.  It  has  been  sug¬ 
gested  that  if  reversible  logic  functions  are  used,  it  is  in  principle  possible  to  do  computing  with 
zero  dissipation [27,  28]!  In  this  scenario,  the  entire  computing  operation  would  have  to  be  carried 
out  reversibly  in  analogy  with  the  dissipationless  operation  of  a  reversible  heat  engine.  It  is  hard 
to  evaluate  the  relative  merits  of  two  schemes  which  promise  to  reduce  dissipation  to  0 *kT  (the 
demand  limit  for  reversible  logic)  and  1  *kT  (the  demand  limit  for  standard  logic)  per  operation, 
respectively,  when  the  best  dissipation  achieved  to  date  is  108  kT!  We  think  it  is  worthwhile  to 
pursue  the  reversible  logic  scenario,  not  so  much  because  it  promises  superior  practical  benefits, 
but  rather  because  it  raises  unfamiliar  questions  about  the  nature  of  computing  and  suggests  some 
interesting  new  approaches  to  the  physical  realization  of  computation. 

There  are  two  types  of  questions  which  arise  when  you  pursue  this  line.  First,  there  is  the 
question  of  what  are  useful  reversible-logic  functions,  how  they  might  be  tied  together  to  make  a 
useful  computer  and  how  such  a  computer  might  be  programmed.  These  questions  are  all  answer- 
able  in  the  abstract,  without  any  reference  to  the  physical  realization  of  the  system.  This  sort  of 
question  is  the  major  subject  of  the  work  of  Fredkin  and  other  pioneers  in  reversible  logic  and  the 
results  are  that  manageable  reversible-logic  computers  can  be  designed  although  they  are  in  many 
interesting  ways  different  from  conventional  computers.  The  second  question  has  to  do  with  phy¬ 
sical  realization  of  reversible  computation:  Whai  sort  of  physical  system  can  be  used,  what 
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calculation  speeds  can  be  achieved,  etc?  Here  very  little  is  known,  although  many  interesting 
questions  arise.  We  think  this  is  the  most  important  aspect  of  the  reversible  computation  prob¬ 
lem  and  have  attempted  to  construct  a  framework  for  a  serious  exploration  of  these  questions. 

4.3.  Physical  Realizations  of  Computers 

To  establish  a  useful  framework  for  our  discussion  it  is  helpful  to  remark  that  there  are  at 
least  two  broad  classes  of  physical  realizations  of  computing  machines.  The  most  important  dis¬ 
tinction  is  between  open  (dissipative)  systems  and  closed  (conservative)  systems.  The  distinction 
is  between  systems  in  which  the  computational  degrees  of  freedom  are  coupled  to  a  "heat  bath" 
with  which  energy  can  be  exchanged  and  systems  in  which  the  computational  degrees  of  freedom 
are  effectively  isolated  from  the  rest  of  the  world.  The  other  essential  distinction  is  between  sys¬ 
tems  in  which  the  computational  degrees  of  freedom  can  be  described  classically  versus  those  in 
which  they  must  be  described  quantum  mechanically. 

A  dissipative  system  will  behave  in  many  respects  like  a  heat  engine.  In  particular  it  should 
be  possible  to  design  it  so  that  it  is  more  and  more  reversible  and  less  and  less  dissipative  the 
slower  it  runs.  This  suggests  an  interesting  tradeoff  between  dissipation  and  speed  of  operation 
about  which  we  will  be  more  quantitative  in  the  next  section.  (The  logical  architecture  of  such  a 
machine  could  be  either  reversible  or  not.) 

A  conservative  system  is  necessarily  reversible  because  any  closed  Hamiltonian  system  is 
reversible.  In  fact,  it  is  physically  reversible  whatever  its  speed  of  operation  and  it  would  hardly 
make  sense  for  the  logical  architecture  of  such  a  machine  to  be  anything  other  than  reversible! 

Any  device  in  which  the  computational  degrees  of  freedom  are  realized  on  a  scale  much 
larger  than  atomic  size  will  inevitably  be  dissipative:  the  total  number  of  physical  degrees  of  free¬ 
dom  vastly  outnumber  those  directly  involved  in  computation,  and  it  is  impossible  to  prevent 
leakage  of  energy  between  the  computer  and  the  "heat  bath".  This  is  the  case  with  all  present- 
day  machines. 
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On  the  other  hand,  if  the  computational  system  were  realized  at  the  atomic  scale,  as  some 
kind  of  cleverly  constructed  lattice,  for  instance,  then  the  computational  degrees  of  freedom 
would  be  a  major  fraction  of  the  total  number  of  degrees  of  freedom.  In  that  case,  the  system 
might  function  as  a  good  approximation  to  a  closed  reversible  Hamiltonian  system  and  the 
choice  of  reversible  logic  structure  would  be  essential.  Needless  to  say,  no  one  has  any  practical 
ideas  on  how  to  realize  such  a  computing  system,  though  of  course,  the  entire  thrust  of  the 
development  of  faster  computation  is  toward  physically  smaller  computing  elements.  The  point  is 
that  if  atomic  scale  computing  elements  are  ever  achieved,  reversible  logic  ideas  may  be  mo6t 
appropriate  for  doing  computation.  The  other  important  dichotomy  in  thinking  about  physical 
realizations  of  computers  is  that  between  classical  and  quantum  mechanical  systems.  This  leads 
to  a  two-by-two  classification  scheme  which  is  shown  in  Table  7. 


Table  7.  ’’Two-Bv-Two”  Classification  Scheme. 

OPEN 

CLOSED 

Classical 

Conventional 

Machines 

Fredkin’s  Billiard 

Ball  Machine 

Quantum 

Josephson 

Future  Atomic 

Mechanical 

Junction 

Scale  Machines  ? 

MACROSCOPIC 

MICROSCOPIC 

Current  computers  are  macroscopic  and  therefore  classical  and  dissipative.  Computers  con¬ 
structed  at  the  atomic  scale  are  surely  quantum-mechanical  and  might  well,  for  the  reasons  dis¬ 
cussed  earlier,  be  effectively  closed,  reversible  systems. 

Non-dissipative  classical  systems  are  consistent  with  Newtonian  mechanics  and  represent 
internally  consistent  idealized  systems  which  turn  out  to  be  a  useful  framework  for  demonstrating 
general  features  of  reversible  computation.  We  will  be  discussing  Fredkin’s  billiard  ball  model  in 
that  light.  Finally,  there  exist  macroscopic  (i.e.,  dissipative)  but  quantum-mechanical  logic  dev¬ 
ices  based  on  the  Josephson  junction  which  we  will  use  to  illustrate  more  precisely  the  theoretical 
limits  on  dissipative  devices. 
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S.  Theoretical  Limits  For  Dissipative  Machines 

The  single  junction  superconductor  interferometer  provides  an  example  of  a  dissipative  logic 
device  whose  properties  can  be  quantitatively  analyzed  in  some  detail.  In  this  section  we  sum¬ 
marize  the  results  of  Likharev[29j  on  the  devic:  schematized  in  Figure  2.  It  consists  of  a  super¬ 
conducting  ring,  broken  by  a  Josephson  junction  (the  cross  in  the  figure),  with  provision  for  con¬ 
trolling  the  maximum  current,  lu  that  can  flow  through  the  junction  by  varying  an  external 
current,  Ic  .  The  superconducting  ring  is  subject  to  an  external  magnetic  field  with  a  flux,  4> 
due  to  the  combined  effects  of  I  and  <t>,  through  it.  The  ring  carries  a  current,  I,  and  has  a  net 
flux,  i i>  ,  due  to  the  combined  effects  of  I  and  <t>,  through  it.  If  the  self-inductance  of  the  ring  is 
L,  the  net  flux  satisfies 


<!>=<t>c-LI. 

The  net  flux,  <t>  ,  is  proportional  to  8  the  difference  across  the  junction  of  the  superconducting 
order  parameter  phase  and  can  be  thought  of  as  the  variable  describing  the  "state”  of  this  system. 
To  be  precise,  8— 2x <!>/(<(> „),  where 


is  the  magnetic  flux  quantum.  The  system  is  made  to  function  as  a  logic  device  by  manipulating 


Figure  2.  Josephson  Junction  Logic  Device. 
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the  state  variable  through  changes  in  the  external  parameters  Ic  and  4>t  . 
The  energy  of  this  system  is  the  sum  of  the  magnetic  field  energy. 

t'-“T  IT  ’ 

and  the  energy  of  a  junction  with  a  phase  difference  5  across  it 


V, 


+ s-* 


Aw 2  JT  cos  ^2  jr  ^/0oj 

(  Iu  is  the  maximum  junction  current,  which,  as  we  have  said  can  be  manipulated  from  the  out¬ 
side).  The  total  energy  functional, 


4t'“77 

generically  has  two  minima.  The  situation  when  <t>e—  0  and  lu>0  is  shown  in  Figure  3.  This 
two-fold  degeneracy  of  the  lowest  energy  state  can  be  used  in  principle  to  store  one  binary  bit  of 
information. 


Better  yet,  we  can,  by  changing  the  external  parameters,  Iu  and  4>e  ,  manipulate  the  shape 
of  the  potential  in  such  a  way  as  to  smoothly  switch  the  system  point  from  one  degenerate 
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ground-state  to  the  other.  This  gives  an  explicit  way  of  switching  our  bit-storage  device,  or  car¬ 
rying  out  as  an  elementary  logical  operation.  A  possible  switching  sequence  is  shown  in  Figure  4. 
where  the  system  point  (the  heavy  dot)  starts  in  the  right-hand  well  and  finishes  in  the  left-hand 
well.  In  this  sequence,  the  system  point  always  sits  at  a  local  potential  minimum  and  the  rate  of 
change  of  the  system  coordinate  is  always  completely  controlled  by  the  external  parameters  and 
can  be  made  as  small  as  we  like  at  the  price  of  dragging  the  switching  event  out  over  a  longer  and 
longer  time.  In  Figure  5.  we  display  a  switching  sequence  where  this  is  not  true.  In  the  third  step 
of  the  sequence  when  the  barrier  finally  disappears,  the  system  point  is  at  a  large  positive  energy 
with  respect  to  the  left-hand  minimum.  It  will  roll  down  the  hill  and  eventually  settle  down  in 
the  left-hand  minimum  only  after  dissipating  its  extra  energy.  The  rate  of  this  motion  and  the 
energy  dissipated  in  it  are  not  controllable  from  the  outside,  and  to  minimize  dissipation  in 
switching  we  must  avoid  this  sort  of  sequence. 


Figure  4.  Switching  Sequence. 
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Figure  5.  Dissipative  Switching  Sequence. 


We  finally  come  to  the  quantitative  evaluation  of  dissipation  in  the  switching  event.  This 
device  has  many  more  co-ordinates  than  the  single  flux  co-ordinate  <j>  ,  in  which  we  are  primarily 
interested.  The  effect  of  these  degrees  of  freedom  can  be  summarized  by  a  viscous  force 

Fv  =  -K<t> 

which  leads  to  damping  of  motions  of  the  system  co-ordinate  (and  dissipation  of  energy  from  the 
i i>  degree  of  freedom)  at  a  rate  determined  by  K.  The  total  energy  loss  in  some  time  evolution  of 
0  is  just 

W  =  -f  dt  Fv  +  Kfdt  <f>2  >  0. 

It  is  particularly  convenient  to  characterize  the  damping  by  the  time  tc  it  takes  small  amplitude 
oscillations  about  a  minimum  to  decay  by  e~l  instead  of  by  K.  In  either  of  the  switching 
scenarios  described  above,  <p  necessarily  is  non-zero  and  there  is  necessarily  some  dissipation.  The 
shape  of  the  potential  during  the  switching  event  is  constrained  by  the  requirement  that  spon¬ 
taneous  switching  into  the  wrong  well  due  to  classical  thermal  fluctuations  must  be  negligible  (this 
means  that  the  energy  barrier  between  the  two  local  minima  must  always  be  much  greater  than 
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kT). 

Given  this  information,  it  is  a  straightforward  matter  to  calculate  the  minimum  energy  dis¬ 
sipation  (corresponding  to  the  sequence  of  figure  5)  in  a  switching  event  carried  out  in  a  time 

interval  r.  The  result  is,  roughly  Wz&kT—  so  long  as  r«rc.  In  other  words,  the  total  energy 

dissipated  in  a  switching  cycle  can  be  made  as  small  as  we  like  by  making  the  switching  time 
arbitrarily  long  compared  to  the  basic  dissipation  time  scale.  This  is  analogous  to  the  situation 
with  heat  engines:  dissipation  or  entropy  production  can  be  made  arbitrarily  small  by  running  the 
engine  arbitrarily  slowly.  We  can  also  determine  the  energy  dissipated  in  a  switching  cycle  like 
that  of  figure  6.  In  that  case  it  turns  out  that  WzzkT  no  matter  how  slowly  we  carry  out  the 
transition  (at  some  stage  the  system  executes  free  fall  down  a  potential  hill  whose  height  is  scaled 
by  kT  so  that  the  system  must  dissipate  energy  of  order  kT  to  come  into  equilibrium).  When  this 
sort  of  device  is  used  to  make  a  computer,  the  question  of  overall  logical  organization  inevitably 
arises.  It  turns  out  that  if  we  use  the  conventional  organization  based  on  (logically  irreversible) 
NAND  gates  (which  can  be  simulated  by  appropriately  connecting  together  several  of  the  above- 
described  switches),  then  switching  cycles  of  the  type  of  are  inescapable  and  dissipation  at  the 
rate  of  roughly  kT  per  operation  is  the  theoretical  limit.  However,  if  a  logically  reversible  organi¬ 
zation  is  used,  it  turns  out  that  only  switching  sequences  of  the  type  of  need  be  encountered  and 
the  dissipation  per  operation  can  be  reduced  arbitrarily  below  kT,  at  the  price  of  reducing  the 
rate  of  computation.  Since  the  motivation  for  reducing  dissipation  was  to  increase  the  rate  of 
computation,  this  seems  rather  self-defeating.  Later  on  we  will  discuss  possibilities  in  which,  at 
least  in  principle,  dissipationless  reversible  computation  can  be  carried  out  at  arbitrary  speed.  In 
the  next  section  we  will  finally  make  explicit  what  we  mean  by  reversible  logical  architecture  and 
devices. 

5.1.  Abstract  Issues 

It  known  that  a  computer  can  be  built  entirely  out  of  a  Boolean  logic  device  called  a  NAND 
gate.  The  action  of  such  a  device  is  symbolized  in  Figure  6.  The  inputs  a  and  b  take  on  the 
values  0  or  1  as  does  the  output.  The  output  is  computed  by  the  function  (ai)  where  the  bar 


Figure  6.  NAND  Gate. 


means  logical  "not”  (  0=1,  1=0  ).  This  logical  function  is  clearly  not  reversible  or  invertible  since 
several  input  states  produce  the  same  output  state.  For  this  reason,  a  conventional  computer 
cannot  be  run  backwards.  The  previous  section  implies  that  the  operation  of  a  physical  NAND 
gate  entails  a  dissipation  of  at  least  kT  per  operation. 

The  discussion  of  the  logical  organization  of  strictly  reversible  computers  was  initiated  by 
Bennett  in  1973[27].  In  pursuing  this  subject,  Fredkin[28]  developed  a  simple  abstract  reversible 
logical  function  which  gave  promise  of  being  a  universal  building  block  for  reversible  computers. 
The  structure  and  action  of  this  function,  called  the  Fredkin  gate,  is  shown  in  Figure  7.  As  in  the 
case  of  the  NAND  gates,  the  input  and  output  lines  take  on  the  values  0  or  1.  An  examination  of 
the  truth  table  for  this  device  shows  that  it  is  invertible:  the  correspondence  between  input  and 


a 

b 
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a 

ab  +  ac 
ib  +  ac 


Figure  7.  Fredkin  Gate. 
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output  states  is  one-to-one. 

By  ignoring  some  outputs  and  fixing  some  inputs  the  Fredkin  gate  can  be  made  to  perform 
any  standard  logical  function.  For  instance,  the  AND  of  a,b,  can  be  obtained  by  sitting  c  =  0 
and  keeping  only  the  middle  output  line,  as  in  Figure  8.  This  procedure  requires  a  supply  of 
input  constants  and  a  way  of  disposing  of  the  unwanted  outputs,  known  as  "garbage” .  The  brute 
force  method  of  carrying  out  reversible  computation  is  to  record  every  one  of  the  garbage  con¬ 
stants  which  is  produced  during  a  computation.  This  is  not  a  very  satisfactory  proceeding  since 
the  number  of  elementary  logical  operations  required  to  carry  out  even  a  simple  arithmetic  opera¬ 
tion,  let  alone  a  complicated  program,  is  enormous  and  memory  resources  would  be  swamped. 

Fredkin,  Toffoli  and  students[30, 31]  have  shown  how  to  get  round  this  problem  by  really 
making  use  of  the  reversibility  of  the  system.  The  point  is  that  if  one  is  doing  some  machine 
instruction  such  as  computing  the  sum  of  two  numbers  which  involves  a  large  number  of  logical 
operations,  one  may:  a)  do  the  calculation,  producing  a  large  quantity  of  garbage,  b)  record  the 
result,  producing  a  very  small  amount  of  garbage  c)  run  the  computation  backwards,  eating  i:o 
the  garbage  produced  in  a).  If  the  machine  instruction  itself  is  logically  reversible,  as  in 

,  one  doesn’t  even  have  to  accumulate  garbage  in  step  b).  The  only  true 


Figure  8.  AND  (a,b). 
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garbage  which  needs  special  memory  allocation  and  has  to  be  kept  to  the  end  of  the  program  is 
that  associated  with  truly  non-invertible  machine  instructions.  By  careful  design  of  the  machine 
instruction  set  and  programming  practices,  it  appears  possible  to  reduce  the  garbage  accumulated 
in  a  typical  program  to  a  manageable  size.  We  are  not  aware  of  a  quantitative  answer  to  the 
question,  if  a  program  requires  a  total  of  N  steps  to  execute,  what  is  the  minimum  number  of  gar¬ 
bage  bits  that  must  be  accumulated?  We  suspect  that  the  answer  is  logN,  which  would  mean 
that  only  a  trivial  amount  of  memory  has  to  be  devoted  to  true  garbage  accumulation,  but  we 
don't  have  a  proof. 

Finally,  as  a  result  of  this  experience,  Fredkin  and  students  have  been  able  to  produce 
sketchy  but  credible  designs  for  real  computers.  These  designs  are  explicit  two-dimensional  wir¬ 
ing  diagram  layouts  of  Fredkin  gates,  and  have  been  demonstrated  in  computer  simulation  exer¬ 
cises  to  work  as  expected. 

To  summarize,  although  computers  based  on  reversible  logic  elements  have  some  unfamiliar 
features,  machines  whose  effective  operation  is  nearly  a  carbon  copy  of  conventional  computers 
can  be  laid  out  as  explicit  two-dimensional  hook-ups  of  the  logically  reversible  Fredkin  gate.  In 
the  next  section  we  will  take  up  the  question  whether  the  Fredkin  gate  is  physically  realizable. 

5.2.  Physical  Realization  of  the  Fredkin  Gate 

In  order  to  give  an  existence  proof  for  reversible  computation,  Fredkin  has  introduced  a 
stylized  model  based  on  perfectly  elastic  collisions  of  billiard  balls  moving  on  a  frictionless 
plane[28|.  Consider  a  two  dimensional  square  grid  as  laid  out  in  Figure  9  with  unit  spacing 

between  the  grid  points  and  identical  hard  spheres  of  radius  -^=-  moving  at  one  lattice  spacing 

per  time  step  along  the  principal  directions  of  this  lattice  as  shown  in  At  time  t  =  0,  the  center 
of  every  ball  lies  on  a  grid  point  and  that  will  again  be  true  at  every  integer-valued  time.  Balls 
will  occasionally  undergo  right-angle  elastic  collisions  at  integer-valued  times  (see  b).  The  balls 
emerging  from  the  collision  will  again  move  along  the  principal  lattice  directions  and  their  centers 
will  coincide  with  lattice  points  at  integer- valued  times.  At  some  lattice  points  a  billiard  ball  will 
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Figure  9.  Square  Grid. 


be  nailed  down  to  function  as  a  perfect  reflector  of  anything  that  comes  by.  The  presence  or 
absence  of  a  billiard  ball  at  a  lattice  site  at  an  integer  time  can  be  taken  as  a  binary  bit  of  infor¬ 
mation  and  the  Newtonian  evolution  of  such  a  system  of  billiard  balls  amounts  to  a  "calculation” 
involving  those  bits. 

The  construction  of  the  Fredkin  gate  goes  in  two  steps.  First  construct  the  gate  shown  in 
Figure  10  where  the  bar  represents  a  fixed  reflector.  This  device  lets  a  ball  on  the  x  path  go 
through  undeflected  if  no  ball  is  on  the  c  path,  but  switches  it  onto  a  different  path  if  a  ball  is 
simultaneously  present  on  the  c  path  (and  lets  the  c  ball  through  undeflected).  The  information 
processing  here  amounts  to  switching  bits  between  two  output  paths,  depending  on  the  context  of 
a  control  path.  If  these  interaction  gates  are  strung  together  according  to  Figure  11  (where  the 
connecting  paths  have  appropriate  delays  in  them  to  maintain  proper  synchronization),  it  is  possi¬ 
ble  to  verify  that  the  overall  system  functions  exactly  like  the  Fredkin  gate. 

According  to  the  previous  section  a  useful  reversible  computer  can  be  made  by  wiring 
together  enough  Fredkin  gates.  The  same  computer  can  therefore  be  realized  as  a  two- 
dimensional  arrangement  of  appropriately  aimed  and  placed  billiard  balls  and  reflectors.  The  exe¬ 
cution  of  a  program  on  such  a  computer  is  just  the  carrying  out  of  the  Newtonian  time  evolution 


of  the  mechanical  system. 


By  construction,  this  system  is  dissipation-free  and  since  the  billiard  ball  velocity  is  arbi¬ 
trary,  it  can  operate  at  any  speed  we  like.  This  amounts  to  an  existence  proof  for  dissipation- 
free,  fast  computing  via  a  classical  conservative  system. 

5.3.  Billiard  Ball  Machine  as  Cellular  Automaton 

The  defects  of  the  billiard  ball  model  as  a  practical  physical  realization  of  reversible  com¬ 
puting  are  fairly  obvious.  It  does,  however,  have  the  virtue  of  suggesting  a  different  abstract 
framework  within  which  some  interesting  new  possibilities  for  physical  realization  suggest  them¬ 


selves. 


48 


The  essence  of  the  billiard  ball  model  is  that  at  integer  time  steps  billiard  balls  are  located 
at  lattice  points  only  and  the  pattern  of  occupied  lattice  sites  changes  from  one  time  step  to  the 
next  according  to  some  rule.  The  rule  is  not  made  explicit,  but  is  the  result  of  evolving  the  previ¬ 
ous  configuration  according  to  Newtonian  mechanics.  The  step  by  step  evolution  of  the  state  of  a 
lattice  according  to  a  local  rule  is  the  subject  of  cellular  automaton  theory,  a  particularly  active 
branch  of  fundamental  computer  science.  It  is  natural  to  ask  whether  the  essence  of  the  billiard 
bail  model  can  be  captured  in  some  cellular  automaton  rule.  For  the  moment,  this  is  just  an  idle 
question,  but  in  the  next  section  we  will  see  that  the  cellular  automaton  framework  is  one  into 
which  it  might  be  possible  to  fit  real  atomic  physics. 

There  is  indeed  a  cellular  automaton  version  of  the  billiard  ball  machine  which  we  have 
reconstructed  from  remarks  of  Fredkin  (the  precise  rule  to  be  used  is,  we  believe,  due  to  Mar- 
golus).  Consider  a  lattice  divided  up  into  individual  cells  by  solid  and  dotted  lines  in  the  manner 
of  Figure  12.  Some  of  the  cells  are  occupied  and  we  want  to  devise  a  transitional  rule  to  cause 
the  pattern  of  occupation  to  change.  If  we  look  at  the  unit  cells  defined  by  the  solid  lines  alone  or 
the  dotted  lines  alone,  we  see  that  they  each  contain  four  of  the  unit  cells  of  the  full  lattice.  The 
transition  rule  will  be  defined  for  such  groups  of  four  cells  and  applied  alternately  to  the  groups 


Figure  12.  Dotted  lines. 
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defined  by  the  solid  lines  and  dotted  lines.  The  transition  rules  we  will  use  are  defined  in  Figure 
13.  Rotations  of  the  rules  presented  are  also  valid.  The  transformation  effected  by  these  rules  is 
obviously  one-to-one  within  the  group  of  four  cells  on  which  they  act.  By  extension,  the  action  of 
these  rules  on  the  lattice  as  a  whole  is  one-to-one  and  reversible. 

A  bit  of  playing  with  the  rules  shows  that  single  occupied  ceils  propagate  like  billiard  balls 
in  the  manner  indicated  in  Figure  14.  Single  occupied  cells  however,  do  not  collide  with  each 
other  in  the  manner  of  billiard  balls.  In  order  for  this  to  work  out  property,  it  is  necessary  to  con¬ 
sider  a  train  of  two  similar  occupied  cells,  as  in  Figure  15  and  the  three  other  versions, 
corresponding  to  the  other  possible  directions  of  motion,  propagate  and  collide  exactly  in  the 
manner  of  billiard  balls.  One  can  also  construct  a  configuration  which  does  not  propagate  and 
reflects  any  billiard  ball  configuration  incident  on  it.  Figure  16 


Figure  13.  Transition  Rules. 


Figure  14.  Propagation. 


Figure  16.  Non-Propagating  Configuration. 


As  the  previous  sections  have  shown,  an  explicit  reversible  computer  design  is  available  once 
we  have  "billiard  balls"  and  "mirrors”.  Now  that  we  know  that  our  cellular  automaton  rules  pro¬ 
duce  these  two  types  of  object  it  is  possible,  in  a  perfectly  explicit  way,  to  construct  a  reversible 
cellular  automaton  computer.  This  is  interesting  because,  as  we  shall  argue  in  the  next  section, 
the  cellular  automaton  framework  seems  particularly  well-suited  to  realization  at  the  atomic  lat¬ 
tice  scale. 

5.4.  Notional  Atomic  Scale  Realizations 

We  have  argued  that  reversible  computing  ideas  are  likely  to  be  of  most  interest  in  the 
study  of  computers  realized  at  the  atomic  scale,  where  the  computational  degrees  of  freedom  are 
not  vastly  outnumbered  by  all  the  rest  and  a  computer  might  function  as  a  good  approximation 
to  a  conservative  Hamiltonian  system.  We  would  now  like  to  explore  a  framework  which  suggests 
that  cellular  automaton  rules  of  the  type  just  discussed  might  actually  be  realizable  at  the  atomic 
scale.  We  don’t  have  a  specific  practical  proposal,  but  rather  some  general  notions  about  the  sort 
of  physical  systems  which  it  might  be  profitable  to  explore. 

Under  the  right  conditions,  atoms  or  molecules  will  arrange  themselves  in  a  regular  lattice. 
For  a  bulk  material,  this  lattice  will  be  three  dimensional,  while  for  material  adsorbed  on  a  con¬ 
venient  substrate  the  lattice  will  be  two  dimensional.  Let  us  consider  a  two-layer  (i.e.  essentially 
two-dimensional)  lattice  of  the  type  displayed  in  Figure  17. 
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Figure  17.  Two  Layer  Lattice. 
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The  lattice  sites  in  the  the  layers,  (foreplane  and  backplane)  are  distinguished  by  open  and 
filled  circles.  The  basic  idea  is  that  the  sites  harbor  some  two-fold  quantum  mechanical  degree  of 
freedom  (such  as  a  spin,  the  presence  of  an  atomic  excitation,  etc.)  which  can  be  manipulated  and 
used  as  a  token  for  computing.  For  convenience,  we  will  refer  to  this  degree  of  freedom  as  a  spin, 
although  it  need  not  actually  be  one. 

There  are  interactions  between  ’’spins”  at  neighboring  sites,  and  we  have  indicated  the 
desired  pattern  of  interactions  by  dashed  and  wavy  lines.  They  will  cause  the  "spins”  on  the 
sites  to  change  with  time  and  our  goal  is  to  cause  this  time  evolution  to  occur  in  a  way  which  car¬ 
ries  out  the  cellular  automaton  rules  discussed  in  the  previous  section.  The  simplest  way  to  do 
this  is  to  imagine  that  all  the  wavy  line  interactions  can  be  turned  on  or  off  simultaneously  from 
outside  by  some  macroscopically  controllable  agency  such  as  a  laser  pulse.  Suppose  that  the  wavy 
line  interactions  can  be  turned  on  and  then  off  in  just  such  a  way  as  to  exchange  spins  between 
the  foreplane  and  backplane  sites  (each  wavy  tine  connects  just  one  foreplane  and  one  backplane 
site).  Suppose  further  that  the  dashed  line  interactions,  which  connect  up  cells  of  four  sites,  either 
all  in  the  foreplane  or  all  in  the  backplane,  can  be  turned  on  and  then  off  in  such  a  way  as  to 
effect  the  transformation  on  spins  corresponding  to  the  cellular  automaton  rules  of  the  previous 
section.  Then  by  alternately  activating  the  dashed  and  wavy  bonds  one  would  effect  the  cellular 
automaton  rules  as  transformations  on  the  "spins”.  Then  by  the  discussion  of  all  the  previous 
systems,  this  microscopic  device  could  be  made  to  function  as  a  reversible  computer. 

If  we  think  of  the  site  variables  as  really  being  elementary  spins,  it  is  easy  to  see  what  is 
involved  in  obtaining  exchange.  The  mo6t  general  interaction  between  two  spins  is 

Ht  —  a(t)< 7i  ■ 

The  bond  strength,  a,  depends  on  t,  since  we  must  imagine  being  able  to  manipulate  from  out¬ 
side.  If  we  turn  this  bond  on  and  then  off  in  such  a  way  that 

00 

J  dta(t)  =  ?r 
o 

(a  matter  of  properly  tailoring  the  laser  pulse,  or  whatever  it  actually  is,  that  manipulates  the 
bond)  then  it  is  easy  to  show  that  the  net  effect  is  simply  to  exchange  the  spins  between  the  two 
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sites.  Although  we  have  not  done  it  explicitly,  we  believe  it  should  be  possible  to  construct  a  set 
of  bonds  for  four  spins  which  can  be  manipulated  in  such  a  way  as  to  carry  out  the  desired  cellu¬ 
lar  automaton  transformation. 

If  a  scheme  of  the  above  type  can  be  found,  it  suggests  that  a  reversible  atomic  scale  (and, 
therefore,  one  might  hope,  very  fast)  computer  could  be  built.  The  obvious  challenge  is  to  find 
semi-realistic  choices  for  sites,  bonds  and  the  extended  driver  of  the  bonds.  We  don’t  have  any 
concrete  response  to  this  challenge,  but  we  think  that  materials  questions  of  the  kind  raised  here 
are  a  rather  natural  sort  of  outcome  of  thinking  about  where  reversible  logic  fits  in  the  overall 
scheme  of  computing  concerns.  W*  have  been  struck  by  the  extent  to  which  previous  work  on 
reversible  computing  has  focused  on  abstract  questions  and  would  strongly  recommend  that  future 
work  begin  to  focus  on  physics  questions.  The  framework  we  have  presented  is  not  necessarily 
the  best  one,  but  does  give  a  way  of  focusing  on  an  interesting  set  of  materials  and  physics  ques¬ 
tions,  and  might  have  the  virtue  of  stimulating  thought. 

5.5.  Quantum  Mechanics  Issues 

The  previous  discussions  have  sot  made  much  of  the  fact  that  physics  at  the  atomic  scale  is 
necessarily  quantum  mechanical.  Indeed,  the  whole  question  of  the  role  of  quantum  mechanical 
effects  in  small-scale  computing  devices  has  been  only  very  sketchily  explored  in  the  literature. 
The  scheme  we  have  been  discussing  has  one  illumin  tig  and  bizarre  quantum  mechanical 
feature  which  we  will  explain,  just  to  give  an  idea  of  the  sort  of  issues  involved. 

The  bonds  of  our  lattice  cellular  automaton  are  alternately  switched  on  and  off  by  some 
external  system  which  acts  as  a  clock  and  driver  for  the  whole  system.  This  driver  is  itself  some 
mechanical  system  executing  periodic  motion;  let  us  for  definiteness  take  it  it  be  a  rotator  of  some 
kind,  rotating  in  some  angular  coordinate,  6,  such  that  every  time  9  passes  through  some 
marker  angle,  60  ,  the  bonds  responsible  for  switching  spins  on  the  lattice  are  activated. 


We  can  write  down  a  fairly  explicit  Lagrangian  for  this  system: 
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l  —  y/e2  +  [s^i(,)  •  ^(,)]jr©  5(e  -  e0)  + 

The  first  term  is  just  the  rotator  kinetic  energy  and  says  that,  in  the  absence  of  other  terms,  the 
system  just  executes  uniform  rotational  motion.  The  next  term  describes  the  interaction  with  the 
"wavy”  bonds  of  the  previous  section:  the  spins  are  divided  up  into  N  pairs  and  the  interaction  of 
each  pair  with  6  is  such  as  to  effect  the  exchange  transition  every  time  6  passes  through  @o- 
The  0  factor  ensures  the  same  action  on  the  spins  no  matter  how  fast  0  is  moving.  The  dots 
indicate  the  terms,  not  yet  specified  but  similar  in  nature,  responsible  for  the  spin  transformations 
on  four  spins  at  a  time  (needed  to  complete  the  cellular  automaton  rules). 

In  the  classical  approximation  to  the  motion  of  0,  the  rotator  proceeds  at  constant  velocity 
and  one  cellular  automaton  transformation  is  executed  per  cycle.  The  quantum-mechanical  ver¬ 
sion  of  the  motion  of  0  is  somewhat  different.  The  rotator  interacts  with  the  computer  coordi¬ 
nates  through  the  sum 

£?,<•>  •  +  ••• 

and,  as  the  calculation  proceeds,  this  sum  takes  on  an  essentially  random  sequence  of  values. 
This  is  roughly  equivalent  to  saying  that  0  is  moving  in  a  one  dimensional  random  potential. 

In  a  random  potential,  there  are  no  propagating  states  and  all  wave  functions  decay 
exponentially  with  distance.  If  a  computation  takes  N  steps,  we  prepare  the  system  in  a  state 
localized  around  0  =  0  and  the  computation  is  completed  when  0  is  finally  observed  at  2 xJV. 
The  exponential  decay  of  wave  functions  probably  means  that  the  time  to  complete  long  calcula¬ 
tions  increases  exponentially  with  N  !  To  know  under  what  circumstances  this  would  be  a  practi¬ 
cal  problem,  we  would  have  to  have  a  much  more  concrete  model  to  work  with.  This  observation 
could  be  elaborated  further,  but  is  meant  to  give  an  example  of  the  peculiar  phenomena  that 
must  be  understood  when  we  try  to  think  about  computing  at  the  quantum  mechanical  level. 
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0.  CONCLUSIONS 

0.1.  Transformations 

The  idea  of  employing  source  program  transformations  is  not  new,  but  has  received  renewed 
interest  with  the  development  of  functional  programming  (the  ’fp’  language)  and  logic  program¬ 
ming  (the  ’PROLOG’  language).  As  discussed  above,  this  technique  has  an  interesting,  if  as  yet 
unproved,  potential. 

Transformation  techniques  may  be  the  path  of  choice  for  Soviet  scientists.  The  Soviets 
have  well-known  problems  in  computer  hardware,  but  have  immense  talent  in  mathematics.  It 
just  may  be  that  the  break-through  needed  in  transformation  techniques  will  be  mathematical  in 
nature.  In  addition,  the  Soviets  have  concentrated  their  software  efforts  in  this  area.  There  are 
really  only  two  major  language/compiler  systems  that  have  been  developed  by  the  Soviets.  The 
rest  are  derivative  of  western  software  systems.  The  first  unique  Soviet  software  system  is  a 
language  for  program  development  and  is  not  of  particular  interest  here.  The  second  is  called 
’ANALYTIK’[32]  and  has  gone  through  at  least  three  major  revisions  since  1970.  Some  of  these 
can  be  traced  in  the  bibliography  of  Appendix  A-2.  An  example  of  output  from  ANALYTIK  can 
be  found  in  Appendix  A-3.  A  brief  reading  of  a  very  restricted  sample  of  the  open  Soviet  litera¬ 
ture  in  this  area  did  not  reveal  anything  of  especial  interest,  however. 

In  general,  the  development  of  transformation  techniques  should  be  closely  followed.  Rapid 
progress  could  occur  once  the  right  good  idea  is  discovered.  There  is,  of  course,  no  guarantee  that 
this  will  occur  any  time  soon. 

0.2.  Reversible  Computing 

The  ideas  in  reversible  computing  are  very  immature  at  present.  The  potential  side-benefits 
from  developments  in  this  area  could  be  very  important  however,  even  if  the  main  ideas  are  not 
found  to  be  feasible.  The  important  areas  to  watch  are  technological.  The  key  is  some  new 
molecular-scale  technology(33,34|.  While  there  are  developments  in  this  area,  they  seem  to  be  a 
very  long  way  from  any  practical  system. 
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6.3.  Final  Remarks 

We  have  discussed  two  ideas  about  how  a  radical  improvement  in  computer  performance 
might  come  about.  There  are  of  course  many  other  possibilities  as  well.  The  most  important 
would  be  methods  of  organizing  parallel  calculations.  This  is  an  old,  but  very  critical  problem. 
The  development  of  computer  system  ideas  is  proceeding  at  a  rapid  pace.  It  will  take  consider¬ 
able  effort  to  try  to  predict  the  likely  direction  of  new  developments. 
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APPENDIX  A-l 
RECURRENCE  SOLVER 


%  Solving  Recurrences: 

%  Assignment  3,  CS257 
%  Peter  Van  Roy 

%  Converts  all  functions  that  can  be  expressed  as  a  polynomial 
%  to  an  efficient  Horner  form. 

%  The  new  form  replaces  the  old  in  the  PROLOG  data  base, 
rgo 

solve(Func),  %  Get  function  to  be  solved. 
retract(solve(Func)),  %  Remove  it  from  data  base. 
funclist(Func,  20,  FuncList),  %  Get  first  20  function  values. 
get_succ_diff(FuncList,  DiffList),  %  Calculate  successive  differences. 
conv_to_poly(DiflList,  Poly/D),  %  Convert  to  polynomial  representation. 
horner(Poly,  N,  Horner),  %  Convert  polynomial  to  efficient  Horner  form. 
abo!ish(Func,2),  %  Remove  old  definition  from  data  base. 
NewFunc=..j|Func,N,F],  %  Arrange  the  result  to  its  final  form. 

(D=l  ->  Expr=Horner;  Expr=Horner/D), 

NewClause=(NewFunc  >  F  is  Expr), 

nl,  write(’The  improved  form  for  ’),  write(Func), 

write(’  is:  ’),  nl,  write(NewClause),  nl, 

assert(NewClause),  %  Insert  the  new  form  in  the  PROLOG  data  base, 
fail.  %  Continue  with  other  functions. 

rgo. 

%  Generate  a  list  of  values  Func(i)  for  i=0,  1,  ...,  N-l. 
funclist(Func,  N,  FuncList)  :- 

funclist(Func,  0,  N,  FuncList). 

funclist(Func,  N,  N,  [])  >  !. 
funclist(Func,  l,  N,  (F|FuncListj)  :- 
Term=..[[Func,I,F], 
call(Term),  !, 

II  is  1+  1, 

funclist(Func,  II,  N,  FuncList). 


%  Get  the  first  elements  of  all  rows  of  successive  differences 
%  down  to  the  row  of  zeroes.  This  is  enough  to  characterize 
%  the  function  completely. 
get_succ_diff(Row,  [])  :-  zero(Row),  !. 
get_succ_diff([A|Row|,  [ A|DiffListj )  :- 
next_row([A|Rowj,  NextRow), 
get_succ_diff(NextRow,  DiffList). 

next_row([A,B|Rowl],  (D|Row2j)  !, 

D  is  B-A, 

next_row([B|Rowl],  Row2). 
next_row([_J,  []). 

zero(  0|Listj)  >  zero(List). 
zero(  ]). 


°v  Convert  a  representation  of  a  function  as  a  list  of  successive 
%  differences  to  a  polynomial: 

%  Uses  the  recurrence:  Poly  =  Ai  +  (N-i)/(i+  l)*NextPoly. 
conv_to_poly(DiflList,  Poly/D)  :- 

conv  .to_poly(DiffList,  0,  Poly/D). 

conv_to_poly([Am],  _,  (Amj/1)  :-  !. 
conv_to_poly([Ai|DiffList],  I,  Poly/D)  :- 
II  is  1+  1, 

conv_to_poly(DiffList,  11,  NextPoly/DN), 
minus(NextPoly/DN,  NegPoly/DN), 
mu!t(NegPoly/DN,  I,  Templ/Tl), 
add([0|NextPoly]/DN,  Templ/Tl,  Temp2/T2), 
div(Temp2/T2,  II,  Temp3/T3), 
add([Ai]/l,  Temp3/T3,  Poly/D). 


%  Convert  a  polynomial  in  list  form  to  a  Horner’s  formula 
%  structure,  using  N  as  the  variable: 

%  (The  second,  third,  and  fourth  clauses  are  optimizations). 
horner((A0|,  N,  AO)  >  !. 
horner([An.l|,  N,  N+  An)  :-  !. 
horner([0|Poly|,  N,  Horner*N)  >  !, 
horner(Poly,  N,  Horner). 
horner((Ai(Po!y|,  N,  Ai)  > 
zero(Poly),  !. 

horner((Ai|Poly|,  N,  Horner*N+  Ai)  :- 
horner(Po!y,  N,  Horner). 


°c  Polynomial  arithmetic: 

%  Polynomials  are  represented  as  lists  of  integers  divided  by  an 
%  integer.  This  avoids  (1)  round-off  error  in  C-PROLOG,  and 
%  (2)  truncation  on  UNSW  PROLOG. 

%  Addition  of  two  polynomials: 
add(Polvl/Dl,  Poly2/D2,  Poly/D)  > 
gcd(Dl,  D2,  G), 

D  is  (D1/G)*D2,  %  D  is  lcm(Dl,D2) 

FI  is  D2/G,  %  multiplying  factor  for  Polyl’s  terms. 

F2  is  Dl/G,  %  multiplying  factor  for  Poly2’s  terms. 
addx(Polyl,  Poly2,  Poly,  Fl,  F2). 

addx([Al|Polyl|,  [A2|Poly2],  [S|Poly],  Fl,  F2)  :-  !, 

S  is  A1*F1+  A2*F2, 
addxjPolyl.  Poly2,  Poly,  Fl,  K21. 
addx([],  Poly2,  Poly 2,  _,  _)  :-  !. 
addx(Polyl,  [],  Polyl,  _,  _). 

%  Change  sign: 

minus(Poly/D,  Res/D)  :-  minusx(P  y,  Res). 

minusx([A|Polyj,  JR (ResJ )  > 

R  is  -A, 

minusx(Po!y,  Res). 
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minusx([],  (]). 


%  Multiplication  by  a  scalar: 
mult(Poly/D,  Scalar,  Res/R) 
gcd(D,  Scalar,  G), 

R  is  D/G, 

S  is  Scalar/G, 
multx(Poly,  S,  Res). 

mu!tx([AjPoly),  S,  [PjRes]) 

P  is  S*A, 

multx(Poly,  S,  Res). 
multx([],  S,  []). 

%  Division  by  a  scalar: 
div(Poly/D,  Scalar,  Res/R)  :- 
DI  is  Scalar*D, 
gcd((DI|Poly],  G), 

(G==l  ->  divlist(Poly,  G,  Res),  R  is  DI/G; 
Res=Poly,  R=DI). 

divlist((A|Poly],  S,  [R|Res])  > 

R  is  A/S, 

divlist(Poly,  S,  Res). 
divlist([],  []). 

%  gcd  calculation: 
gcd(X,  0,  Y)  >  !,  X=Y. 
gcd(U,  V,  X)  :- 

W  is  U  mod  V, 
gcd(V,  W,  X). 

%  gcd  of  a  list: 
gcd([A|,  A)  :-  !. 
gcd([A,B|List|,  1)  > 

gcd(A,  B,  1),  !. 
gcd([A,B|List|,  Ans)  > 
gcd(A,  B,  G), 
gcd([G|List|,  Ans). 
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CACHING  SYSTEM 


%  CACHE: 


%  Generic  program  to  cache  heads  of  non-unit  clauses, 

%  FORM:  X(Y,Z):-  ... 

%  Must  fix  up  correct  number  of  variables  ”Y,Z,...”. 

%  The  following  compensate  for  principle  functor 
%  not  being  a  variable. 

asf(f,Y,Z):-asserta((f(A,B):-eq(A,Y),eq(B,Z),!,fail)). 

asf(a,Y,Z):-asserta((a(A,B):-eq(A,Y),eq(B,Z),!,fail)). 

asf(r,Y,Z):-asserta((r(A,B):-eq(A,Y),eq(B,Z),!,fail)). 

rtf(f,Y,Z):-retract((f(A,B):-eq(A,Y))eq(B,Z),!,fail)). 

rtf(a,Y,Z):-retract((a(A,B):-eq(A,Y),eq(B,Z),!,fail)). 

rtf(r,Y,Z):-retract((r(A,B):-eq(A,Y),eq(B,Z),!,fail)). 

ass(f,YQ,ZQ)>  asserta((f(YQ,ZQ):-cnt(f))). 
ass(a,YQ,ZQ)>  asserta((a(YQ,ZQ):-cnt(a))). 
ass(r,YQ,ZQ):-  asserta((r(YQ,ZQ):-cnt(r)j). 

rts(f,YQ,ZQ)>  retract((f(YQ,ZQ):-cnt(f))). 
rts(a,YQ,ZQ)>  retract((a(YQ,ZQ):-cnt(a))). 
rts(r,YQ,ZQ)>  retract((r(YQ,ZQ):-cnt(r))j. 

%  FAIL  CACHING. 

%asf(X,Y,Z):-asserta((X(A>B):-eq(A,Y),eq{B1Z),!,fail)). 

%rtf(X,Y,Z):-retract((X(A,B)>eq(A,Y),eq(B,Z),!,fail)). 

%  Cache  a  fail. 

fcache(X,Y,YP,Z,ZP)>  nvbind(Y,YP,Z,ZP), 
cleanup(X),asf(X,Y,Z),!. 

%  Cleanup  last  cache  insertion. 

cleanup(X):-  rtf(X,Y,Z),  fremovefX.Y.YQ.Z.ZQJ.asflX.YQ.ZQ). 
cleanup(X). 

%  Eliminate  duplicates  and  submissive  entries. 
fremove(X,Y,YQ,Z,ZQ):-rtf(X,YP,ZP), 

switch(Y,Yp,YR,Z,ZP,ZR), 

fremove(X,YR,YQ,ZR,ZQ), 

fconassert(X,YQ,YP,ZQ,ZP). 

fremove(X,Y,Y,Z.Z). 

fconassert(X,Y,YP,Z,ZP)>  dominate(Y,YP),dominate(Z,ZP),!. 
fconassert(X,Y,YP,Z,ZP)>  asf(X,YP,ZP). 


%  CACHE  SUCCESS. 

%ass(X,YQ,ZQ)>  asserta((X(YQ,ZQ))). 

%rts(X,YQ,ZQ):-  retract((X(YQ,ZQ))). 

%  Remove  failed  head  from  the  cache. 
scache(X,Y,YP,Z,ZP)>  fremv(X,YP,ZP),  ycache(X,Y,Z)(!. 


%  Remove  exact  fail  head. 
fremv(X,Y,Z)>  rtf(X,YP,ZP), 

fremv(X,Y,Z),  fcassert(X,Y,YP,Z,ZP). 
fremv(X,Y,Z). 

fcassert(X,Y,YP,Z,ZP):-  eq(Y,YP),eq(Z,ZP),!. 
fcassert(X,Y,YP,Z,ZP)>  asf(X,YP,ZP). 

%  Success  caching 

ycache(X,Y,Z)>  sremove(X,Y,YQ,Z,ZQ),ass(X,YQ,ZQ),!. 
%  Elim  Dups. 

sremove(X, Y,YQ, Z, ZQ ):-  rts(X,YP, ZP), 

switch(Y,YP,YR,Z,ZP,ZR), 

sremove(X,YR,YQ,ZR,ZQ), 

sconassert(X,YQ,YP,ZQ,ZP). 

sremove(X,Y,Y,Z,Z). 

%  Replace  head  in  cache. 
sconassert(X,Y,YP,Z,ZP)>  dominate(Y,YP), 
dominate(Z,ZP),!. 

sconassert(X,Y,YP,Z,ZP)>  ass(X,YP,ZP). 
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%  CACHE  UTILITIES. 

%  True  if  Y  &  YP  are  the  same. 
eq(Y,YP):-var(Y),var(YP),!. 
eq(Y,YP)>  Y==YP. 

%  True  if  Y  is  more  general  than  YP. 
dominate(Y,YP):-var(Y),!. 
dominate(  Y,  YP )  :-Y= = YP . 

%  Selects  most  general  set  of  terms. 
switch(Y,YP,YP,Z,ZP,ZP)>  dominate(YP.Y), 
dominate(ZP,Z),!. 
switch(Y,YP,Y,Z,ZP,Z). 

%  Binds  only  non-variables. 

nvbind(Y,Y,Z,Z):-nonvar(Y),nonvar(Z),!. 

nvbind(Y,Y,Z,V):-nonvar(Y),!. 

nvbind(Y,U,Z,Z)>nonvar(Z),!. 

nvbind(Y,U,Z,V). 


%  Call  functions  for  monitoring  and  caching. 

%  Heau. 

h(X,Y1Z):-cnt(X),inc(level),nl, 

count(level,N),tab(N), 

write(X),write(’(’), 

write(Y),write(’,’), 

write(Z),write(’)’),!. 

hs(X,Y,YP,Z,ZP):-  h(X,Y,Z). 

hf(X,Y,YP,Z,ZP):-  h(X,Y,Z),  confcache(X,Y,YP,Z,ZP),!. 

%  Tail. 

t(X,Y,Z):-count(level,N),tab(N),dec(level), 
write(’  ,  success:  ’), 
write(X),write(’(’), 
writ  Y),write(\'), 
write(Z),write(’)’),nl,!. 

ts(X,Y.YP,Z,ZP):-  t(X,Y,Z),  conycache(X,Y,Z). 

tf(X,Y,YP,Z,ZP)>  t(X,Y,Z),  conscache(X,Y,YP,Z,ZP),!. 


%  Cache  control. 

conycache(X,Y,Z):-cache(on),cache(X),  ycache(X,Y,Z),!. 
conycache(X,Y,Z). 

conscache(X, Y, YP ,Z ,ZP):-cache(on ),cache(X),  scache(X,Y,YP,Z,ZP),.'. 
conscache(X,Y,YP,Z,ZP). 

confcache(X,Y,YP,Z,ZP):-cache(on),cache(X),  fcache(X,Y,YP,Z,ZP),!. 
confcache(X,Y,YP,Z,ZP). 


cache_on  >  assert((cache(on))). 


cache_off:-retract((cache(on))),!. 
cache  off. 


%==== 


=  END  ===== 


^Program  commands. 
go>  cache_off,  w(’orig.p’), 

p_compile(all),  w(’compiled.p’), 
reset, 

rvt,  w('vt.p’),  reset, 
rtv,  w(’tv.p’). 

gol:-cache_on,  w(’orig.p’), 
rtv  ,w(’tv.p’), reset, 
rvt,w(’vt.p’j. 

go-2:-cache_on,  w(’orig.p’), 
rvt, w(’vt.p’), reset, 
rtv,w(’tv.p’). 

w(N)>  tell(N),phead,lp,chead, 
tc,told,close(N). 

rvt:-  tell(tracevt),thead,r(v,t), 
nl,nl,nl.ni,chead, 
lc,tc,told,close(tracevt). 

rtv:-  tell(tracetv),thead,r(t,v), 
nl.nl, nl.nl.chead, 
lc,tc,told,c!ose(tracetv). 

tc:-count(m,M),count(c,C),count(f,F), 
count(a,A),count(r,R ), 

TisM+C+F+A+R, 
write(’Total  calls  =  ’), 
write(T),nl. 

%Input/output. 

phead:-write(’  EXECUTED  PROGRAM  LISTING  ’),nl,nl. 
thead:-write(’  TRACE  OF  PROGRAM  EXECUTION  ’),nl,nl. 
chead:-write(’  CALL  COUNTS  ’), nl.nl. 

^General  purpose  counters. 

count(X.O). 

cnt(X):-inc(X). 

gencnt(0):-assertz((count(X,0))),!. 

gencnt(_). 

inc(X):-retract(count(X,N)),M  is  N  +  1, 
asserta(c'junt(X,M)),gencnt(N),!. 
dec(X):-retract(count(X,N)),M  is  N  -  1, 
as8erta(count(X,M)),gencnt(N),!. 
zero(X):-retract((count(X,M))). 
reset:-zero(_), reset. 
reset:-assert((count(X.O))). 


^Program  listings. 

lc:-listing(count). 

If:-listing(f). 


la:-listing(a). 

lr:-listing(r). 

lp:-listing([m,c,f,a,r,count]). 
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APPENDIX  A-4 

AN  EXAMPLE  OF  SOVIET  WORK  IN  MACHINE 
SYMBOLIC  MANIPULATION 


KRUPSIROV.ELNSr  DAY!  SO  TICE 
POSTS  RESTARTS 

noYosiBiasE  90 

630090,  SOVIET  UNION 


DOCTOR  3.  DA  YD  SACTOERS 
•SICSAU  BULLETIN-  EDITOR 
DEPAR2UENT  OP  MATH2MAHCAL  SCISiCES 
aaiSSHAfS  POLPTECEJCC  INSTITUTE 

trot,  Ear  pore  12181,  usa 


UUr'M 


DEAR  DAVID, 

THE  INTEREST  IH  U5I®  TR2  2EDRT  0?  CfflERAUDED  H7?ffiC2aiET2IC  JUNCTIONS 
IH  COMPUTER  ALGEBRA  ALGORITHMS  IS  INCREASING.  I  OPTS  A  PROBUM  FOR 
CONSIDERATION  H7  READERS  OP  TOUR  RESPECTABLE  BULLETIN. 

WHAT  IS  A  MINIMAL  SET  0?  IDiSTITI2S  APPLICATION  0?  WHICH  PACTC2I2S 


EACH  OP  IRS  FOLLOTOIC  TEH  /UNCTIONS  INTO  HE  PRODUCT  OF  7/0  HTP2R- 
CSCU2TEIC  PUNCTIONS  WITH  LEAST  NUU33L  OP  PARAMETERS? 


r  f ns. 


p  / o-  » "2\ 

3  \a,  tfi+a,iA  y 


r  fill  -,2\ 

*  U -*.£♦«.  / 

r  (  «•  i*\ 


f  fCLt4l*~4.  )1\ 

1  3  (i/i+4-  4i41+o.+4,1a.  y 

3  z  [41+*,  la.  / 
r  fcL,a-4,a+4  ;*i\ 


p  Z’  a,  4i-a  i]i*a+&\' 3 

1  \  i/i*a ,  , 


I  Bill  EVE  IT  WILL  22  VIST  INSTRUCTIVE  TO  BRING  22  CORRESPONDING 
PROCRAMS  IN  RSXJCS-2,  MACSTUA,  ARiIN2Z-71,  AND  0T22  SIC3-L27Z, 
LANCUACZS  INTO  COMPASSION.  IP  TOU  WISH  I  SHALL  imSlATSLT  A2MAIL  \£t 
PRKRAM  IN  AHAH22-71  TO  TO U. 

HA72  TOU  RECEIVE  U  LETTER  0?  JANUART  26? 

WITH  WARMEST  REGARDS, 

ELHST  OP  HCVCS2IRSZ 
ALC23RA  PROGRAMME* 

CAR3CN  COPIES  TO  PRCJTSSCSS  AN2CJDT  CUM  HEARN  AND  RICHARD  J.PA72UH 

enclosure:  test  ruh  output  with  up  couu^ts. 
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